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CPU 
COM 6 
2p 
1600 x 1200 |280 x 1024 
: , : — 24-bit prog. 48-bit prog. 
EPIC™ XE-900 10/100 Base—T Dual 10/100 Base—T 
PC/104 & Plus PC/104 & Plus 
l 0 G Hz C PU 3.6A operating |.6A max. 
Temp. range 40" to 70/85° C =40° 16. 60" 








Need Linux, QNX, Windows®? 
Try ourOS EMBEDDER"™ KITS 


Our kits are the shortest path to 
a successful OS on an Octagon 
embedded computer. 


¢ Pick your Octagon SBC 
¢ Pick the OS you prefer: Linux, 
Windows, QNX 


Octagon delivers a high 
performance, total solution. 








Try our XBLC 





XBLOKs offer the best compromise 
in cost and function for both PC/104 
and PC/104-Plus. Only 44% the size 
of a standard PC/104 card, you can 
add two functions to your system 
but increase the stack height by 

only one level. —40° to 85° C. Heat 
diagram shows enhanced cooling. 


NEW ¢ 








Designed for the XE-900, 
our conduction cooling system 
eliminates a fan even at |.0 GHz. 


OCTAGON 


2 MB high speed, SRAM 


Read and write at full bus speed 
Pointers to memory saved if CPU 
resets or loses power 


"48 digital I/O, 5V compatible 


Source and sink 16 mA per output 
Direct connection to 
opto-module racks 


Up to 230.4 kBaud data rate 


Supports RS—232/422/485 
RS—485 fault protected to +60V 


AIN 
10/100 Base—T, Intel 82551ER 
Fully plug—n—play 
High performance, 
PCI bus interface 


Speeds up to 480 mbps 


Mix and match USB I|.1I and 2.0 
Current—limited ports can supply 
500 mA to external devices 


For a full listing of 
Octagon Systems 
products, visit us at 
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SBS knows AdvancedMC. Choose from our large selection of real, 


available products. 


SBS DOESN'T JUST ANNOUNCE 
AdvancedMC Modules, we build them. 
And you're looking at real products. 
For a long time, a modular, open 
telecom standard was just a dream. SBS 


Technologies is making it a reality. 


In less than one year SBS has introduced 
and built more than a dozen AdvancedMCs 
and anumber of carriers. We know it can 
be tough to keep up with all the progress 
AMCs are making, so we created the 
AdvancedMC Insider monthly newsletter 
with latest AMC news. To subscribe, go to 


www.advancedmcinsider.com. 








SBS knows. 


TELUM TSPEO1 
TELUM ASLP10 
TELUM 624/628-TEJ 
TELUM 1001-012M/S 
TELUM 1001-03 
TELUM 1004-O3M/S 
TELUM 1001-DE 
TELUM 1204-03 
TELUM GE-QT 
TELUM FC2312-FF 
TELUM FC2312-CC 
AT-AMC1 

AT-AMC2 
BCT4-AMC1 

TELUM GPSTC-AMC 
TELUM 2001-VGA 


ESCRIPTION 

Processor AMC module with PowerPC® 7447A processor 
Intel® Pentium® M processor AMC module 

WAN Edge Access I/O modules 4 or 8 port T1/E1/J1 
WAN OC-12 module 

WAN OC-3 module 

WAN OC-3 module 

WAN DS3/E3 module 

WAN intelligent AMC.2 multi-service 4-port OC-3 module 
Gigabit Ethernet AMC 4 port NIC 

Fibre Channel HBA cards (fiber-optic media) 

Fibre Channel HBA cards (copper media) 
AdvancedTCA® carrier for 2-4 AMC.1 module 
AdvancedTCA® carrier for 2-4 AMC.2 module 

IBM® BladeCenter® T carrier for 4 AMC module 
GPS-based clock AMC module 

AMC VGA module 


Find the AdvancedMC product you’re looking for at Wwuw.sbs.com/amce or call 800.SBS.EMBEDDED 
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10 Industry Insider 


XMCs Bring Digital Video Standards to High-Performance Applications 
Stephane Joanisse, Curtiss-Wright Controls Embedded Computing 





FPGAs Revolutionize Mezzanine Boards 


Rodger H. Hosking, Pentek XMC cards can be configured with up to 


iSCSI Brings Distributed Storage to Mezzanine Clients six connectors, including the Pnd fabric 
Stan McClellan, SBE connector and the Pn6 XMC connector for 
mezzanine |/O. ¢ Pg. 14 


Solutions Engineering PCI Express and InfiniBand 


The PCI Express-InfiniBand Connection 
Jack Regula, PLX Technology 


Accelerate Your Applications with Hard Real-Time and 
Reliable I/O over InfiniBand 


Sujal Das, Mellanox Technologies 


Industry Insight }Machine-to-Machine 


Communicating Machines Are Triggering an Embedded Revolution 
Bob Burckle, WinSystems and Steve Pazol, nPhase 


Internet Protocols Ease Development Cost and Time for 
M2M Communication 
Alan Singer, ConnectOne 





Communication options for IP-enabling an 
Executive Interview M2M application. ¢ Pg. 47 


53 RTC Interviews Bill Kehret, CEO Themis Computers 


Software & Development Tools MILS and FIPS for Security 
59 MILS Middleware for Secure Distributed Systems 


Gordon Uchenick, Objective Interface Systems 


Industry Watch 


68 Competition Heats Up in the Battle for COTS Processing Technologies 
Neil Harold, Nallatech 





71 ‘Flexible Multicore Pipeline Infrastructure Reduces Software Pain 


Bryon Moyer, Teja Technologies Hot-Swap, 12-Port 10 Gbit/s Fibre 


Channel/Ethernet/iSCSI Card. ¢ Pg. 62 





Fleet management is a prime example of a machine-to-machine application where autonomous 
nodes exchange information about truck location, freight load, routing and freight waiting at 
terminals, allowing dispatchers to dynamically monitor and control the system. 

Photos courtesy of Transics N.V. 
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-Operate ano 

survive under 
the most extreme 
conditions with 

ruggedized E-Disk® 
solid-state flash drives and 
network storage solutions. 
BiTMICRO’s cutting-edge 

storage technologies offer utmost 
reliability, optimum data security and , 
unmatched performance. = 


Ethernet | Fibre Channel | SCSI | IDE/ATA 
USB | FireWire | cPCI VME | SATA | iSCSI 
PCI-X | PCI Express | SAS | Infiniband 


BITMIC 02 BiTMICRO Networks, Inc. ~® www.bitmicro.com 
i Yr 45550 Northport Loop E < info@bitmicro.com 


ULTIMATE STORAGE SOLUTIONS™ Fremont, CA 94538-6481 510-743-3475 























MISSION CRITICAL....... 


data storage modules 





Extreme Comprehensiveness: We offer the most comprehensive VME/cPCI 
storage product line in the world, offering device alternatives for 
any standard or unique application. 
¢ Solid State Disk ¢ Removable Hard Disk 

¢ Tape Drives ¢ Optical Disk « PCMCIA Adapter 
Extreme Performance: Our VME products feature extreme speed, capacity and 
ruggedly reliability with 320 MB/sec throughput enabled by 
LVD SCSI technology, storage capacity of more than 600 GBs (e) BN 
per module and a 1,400,000 hour MTBF. o x": 
Extreme Quality: Phoenix International is the only ~ : 
manufacturer of VME data storage products that is et A 
ISO 9001:2000 Certified. ” 
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Phoenix International Systems, Inc. An ISO 9001:2000 Certified SDVOSB 
114-283-4800 © 800-203-4800 ¢ www.phenxint.com 
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GE Fanuc Automation 





Embedded Performance. 


GE Fanuc Embedded Systems can support your full range of 


Looking for an embedded computing solution that gives you a 


tremendous advantage over your competition? Look no further 
than GE Fanuc Embedded Systems. 


Featuring a comprehensive offering that includes Intel-based 
SBCs and complete I/O systems, industry-leading communications 


embedded computing needs to solve your greatest challenges. 
From standard product requests to a solution that is quickly and 
fully customized to your specific application, GE Fanuc Embedded 
Systems has the breadth, depth and support capabilities to provide 
a serious boost to your performance. 


technology, rugged flat panel monitors and computers and more, 
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ATCA-7820 

AdvancedTCA Dual Core Intel Xeon 

Processor LV 2.0 GHz Processor Node Board 

¢ Combines two dual core processors, 
providing four high performance cores 

¢ PICMG 3.0/3.1 compliant 

e Processor speeds up to 2.0 GHz 

e Up to 8 GB DDR-2 SDRAM with ECC 

e AMC.1 compliant site (PCI-Express x 8) 

e Single PCI-X PMC site at 64-bit/66 MHz 

¢ Single 10/100 Ethernet interface 

¢ Dual serial ports 

e Four USB 2.0 ports 

e Serial ATA interface 

¢ Optional 2.5-inch IDE hard disk drive 

¢ OS support for Windows XP, Windows 
2000, and Carrier Grade Linux 





Embedded Systems 





~ ¥ 
CPCI-7808 
Intel Pentium M CompactPCl 
Single Board Computer 
¢ PICMG 2.16/2.9 compliant 
e Processor speeds up to 1.8 GHz 
e Up to 2 GB DDR SDRAM 
e Dual PMC sites 
- 64-bit/66 MHz site 
- 32-bit/33 MHz site 
¢ Dual 10/100/1000 Ethernet interface 
¢ Dual 16550-compatible serial ports 
e Three USB 2.0 ports 
e Serial ATA interface 
e Up to 1 GB CompactFlash 
¢ OS support for Windows XP, Windows 
2000, QNX, Linux, and VxWorks 


Learn more at www.gefanuc.com/embedded 





CP920 

CompactPCl Managed 

Gigabit Ethernet Switch 

¢ PICMG® 2.16 compliant 

e Layer 2/3/4 switching 

¢ Twenty-four 10/100/1000 Ethernet ports 

e PICMG® 2.9 Rev 1.5 IPMI compliant 

e PICMG® 2.1 Rev 2.0 hot swap compliant 

¢ 802.1p, 802.1Q VLAN, deep packet filter- 
ing, link aggregation, Rapid Spanning 
Tree (802.1w, 802.1d), broadcast storm 
control, port mirroring 

¢ Conduction cooled model available 
- Twelve 10/100/1000 Ethernet ports 





PMC-0247 

Serial ATA Hard Disk Drive Module 

¢ 40 Gbyte or 80 Gbyte options available 

¢ Support for SATA | (150 Mbps) and 
SATA II (300 Mbps) interfaces 

e Support for programmable External 
Flash for BIOS expansion 

e Supports 32/64-bit, 133 MHz maximum 
PCI-X interface 

e Fast read/write performance 

e VITA 39 compliant 

¢ OS support for Windows XP, Windows 
2000, Red Hat Linux, and Enterprise 4.0 


©2006 GE Fanuc Automation. All rights reserved. 





Editorial 
June 2006 


The Modular Model Has 
Modified Methodologies 


by Tom Williams, Editor-in-Chief 


here’s this expression we bandy about every day. We apply it 

to all sorts of things and situations without stopping to con- 

sider what it implies. The words “module” and “modular” 
are used throughout both technology and everyday life. We talk 
about modular furniture, modular construction and, of course, 
modular system design. It stems from the human ability to hide 
complexity behind abstraction, and it gives us the ability to con- 
struct truly complex works of art and technology. To use that 
overworn cliché, modularity keeps us from having to constantly 
“re-invent the wheel.” 

In the embedded computer arena, where often very complex 
systems are nonetheless dedicated to very specialized tasks, the 
ability to think in and work with modules makes it possible to 
work at the functional level of building blocks, which may them- 
selves have very complex internal workings. While this may all 
seem obvious, think about how it affects system design. A “mod- 
ule” has three basic characteristics: what is does, how it connects 
and how/where it fits. 

What a module does is simply its functionality. It performs 
CPU tasks, DSP functions; it is a switch; it pre-processes I/O 
data and so on. This is quite separate from the issue of how it 
performs its function. That is an issue of little interest to the 
developer building a modular system. How well it performs 1s, 
of course, a more important question because system develop- 
ers select modules based not only on their function but also on 
their performance, cost, power consumption, heat dissipation and 
size/form-factor, to name several important criteria. These are all 
also external characteristics dictated by the internal engineering 
of the module. 

The job of the modular system developer is not to re- 
engineer those internals but to select the optimum mix of 


Magnificently! 





characteristics for his or her purposes. If no satisfactory combi- 
nation is available, then the heavy-duty specialized engineering 
will be brought in to address the problem, bringing up the classic 
“build or buy” decision. 

How a module connects refers, of course, to the interfaces. 
These consist of such things as the bus architecture, I/O, serial 
fabric interconnects, mezzanine sites and front panel ports. The 
interfaces also increasingly consist of APIs for embedded operat- 
ing systems, middleware and communication protocols. This has 
come about because vendors realize that offering a selection of 
operating systems and pre-qualified board support packages adds 
to the value of their product and reduces the customer’s time-to- 
market. This level of software support has become an intrinsic 
aspect of modular design. Modules also now consist of intellec- 
tual property that is integrated in FPGAs, object-oriented soft- 
ware libraries, protocol stacks and middleware. Many of these 
are delivered pre-integrated on other modules. 

Where and how a module fits addresses the issue of inte- 
grating it with other modules to form a system, 1.e., the system 
architecture level. For example, do we go with a large number of 
functions integrated on one or two boards, or do we use a larger 
number of single-function modules to retain flexibility? What 
will be the size, power consumption, performance and cost of the 
whole shebang? 

All of these are intense and important engineering issues. 
The point is that they have clear boundaries and intersection 
points (1.e., interfaces) and these boundaries form a hierarchy that 
has allowed the creation of huge communication systems, space 
travel, medical advancements and industrial production—be- 
cause the ability to think modular lets us work large. @ 
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Microsoft Previews Windows CE 6 Operating System 


Microsoft has announced the availability of a beta release of Windows CE 6, the next genera- 
tion of its real-time software. The new version features a redesigned operating system kernel 
architecture, expanded capacity for simultaneous processes and a newly integrated tool set. 
Windows CE 6 delivers a more integrated embedded development environment that is now avail- 
able via a plug-in for Visual Studio 2005. Now developers have a single familiar tool to help them 
quickly develop both operating systems and applications, enable them to improve time-to-market 
and reduce development costs. Visual Studio 2005 enables over 7 million Visual Studio develop- 
ers worldwide to use their existing tools and skills to create differentiated embedded devices. 

The redesigned operating system kernel architecture supports significantly more simultane- 
ously running processes, from 32 up to 32,000 simultaneous processes, each of which can run 
in a 2 Gbyte virtual memory address space. Windows CE 6 provides continuity of features and 
functionality from previous generations of Windows CE, allowing device makers to utilize their 
previous investments in user interfaces, applications, middleware and drivers. 

Windows CE 6 builds on Microsoft’s Shared Source Initiative, offering developers extensive 
access to millions of lines of Windows CE source code. Developers and device makers have the 
right to modify and distribute custom components with their Windows CE-based products. The 
Shared source code license also includes a flexible template that lets device makers create 
unique, customized user interfaces to further differentiate their devices. 


Industry Group to Establish : 


SP100 Industrial Wireless 
Standard 


Hauser, Flowserve, 
OMNEX Control 


wireless worker functions. 
These industry 
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| Integrated end-to-end security that 
will guard against cyber threats, as : and control environment.” 
_ well as enable companies to reduce 
Adaptive Instruments, Endress+ | 
Honeywell, | 
Systems, 3e | 
Technologies International (3eTT) - 
and Yokogawa announced they have 
joined the Instrumentation, Systems 
and Automation Society (ISA). 
SP100 working group to support | 
the committee’s efforts to create an 
open industrial and multi-functional 
wireless standard. This industry | 
group will work toward a joint 
solution that will enable industrial _ 
plants to use a single wireless © 
network architecture to support a 
wide range of applications from low- 
rate monitoring to process control to 
to develop interoperable products 
group - 
members are developing a network 
structure that is based on existing 
wireless technologies to meet current 
and future customer requirements. _ 
The SP100 committee has identified 
needs for managing devices, from 
just a few up to tens of thousands 
of devices residing on a single, 
scalable, wireless plant network. The 
network architecture will provide | 


the number of disparate wireless 
networks used at a plant—which 
also reduces operational costs and 
disruptions. 


Industrial wireless has many | interoperability 


complex issues when compared for RapidIO, announced it has | 
established a Qualified Vendor. 


comprehensive program (QVP) designed to ensure 


solution to be succesandstul in | its newly established lab utilizes 


the long run. The industry group the most advanced equipment. 


members are drawing from their i to perform interoperability and 


current wireless industrial plant specification compliance testing. 
RIOlab also announced the first | 
best four qualified vendors: 
_Embedded Tools 


(FET), Nexus Technology, Silicon | 
environment for industrial vendors Turnkey Express 
_ Tektronix. RIOlab runs a state- 


es _ of-the-art 
ability to choose from many | facility 
: interoperability and specification | 


industry group will | compliance reporting that meets | 


develop this solution within the the growing needs of silicon 


ISA-SP100 committee, which was vendors and OEMs. using the 


chartered last year “to establish : RapidIO interconnect standard. 
recommended prac- | 
tices, technical reports and related : 
define 


implementing | 


to traditional wired offerings 
and needs a 


experiences as well as extensive 
research to. select the 
wireless technologies. The intent 
is to provide an open, standard 


to provide their customers the 


products to use in their plants. 
The 


Standards, 


information that will 


procedures for 
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wireless systems in the automation | 


_RapidlO Interoperability 
_ Lab Establishes Qualified 
_ Vendor Program 


RIOlab Corporation, a new : 
testing facility | 


Fabric 
Corporation | 
(STx) and. 


testing 
device | 


independent 
that provides 


The QVP creates a partnership 


opportunity for RIJOlab and 
RapidIO = ecosystem _ vendors. 
Ecosystem vendors value the 


opportunity to be showcased within 
RIOlab, where they achieve greater 
visibility of the interoperability 
of their products and equipment. 
RIOlab benefits from the breadth 
of RapidIO ecosystem participation 
and products while becoming a 
world-class testing facility. The 
QVP participants have contributed 
hardware or software and have 
demonstrated compliance to the 
RapidIO specification. In addition, 
these vendors have agreed to 
work with RIOlab to provide any 
upgrades or new products, ensuring 
that the lab is always using state-of- 
the-art technology. Tektronix and 
Nexus Technology are delivering 
logic analyzers and bus supports, 
FET is providing software tools, 
and STx is contributing hardware. 


VMetro and AdvancedlO 
Collaborate on 10 GbE Data 
Recording Systems 
AdvancedIO Systems and 
VMetro have announced = an 
agreement to jointly develop and 
commercialize 10 Gbit Ethernet 
(10 GbE) data recording, playback 
and analysis systems. Suited for 
digital signal analysis applications 
as well as medical scanners and 
industrial inspection systems, 
these 10 GbE recorder systems 
will utilize AdvancedIO’s 10 
GbE PMC/XMC family of packet 
processing modules and VMetro’s 
Vortex Open Recorder platforms. 
The products within VMetro’s 
Vortex family of real-time data 
recorders for analog and digital 
applications share a common design 
philosophy that allows mixing and 
matching of I/O, recording engines, 
storage and analysis to meet a wide 
array of program requirements. 
The Vortex family includes an 


Get Connected with companies mentioned in this article. 
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with Graphical System Design 
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> Control design > 1/0 modules and drivers > Rugged deployment platforms 
> Intellectual property libraries > COTS FPGA hardware > Distributed networking 

> Digital filter design > VHDL and C code integration > Human-machine interfaces 

> Dynamic system simulation > Design validation tools > Firmware management 


Graphical System Design 
Accelerate your embedded design using National Instruments LabVIEW graphical 
programming, third-party tools, and commercial off-the-shelf hardware. Graphical system 


design empowers you to rapidly design, prototype, and deploy embedded systems. 


“With graphical system design through NI LabVIEW 
and CompactRI/0, we designed a motorcycle ECU 
prototyping system in three months versus two-and- 
a-half years with traditional tools.” 


— Carroll Dase, design engineer 
Drivwven, Inc. 


Learn how to design faster through Webcasts by Analog Devices, 


Celoxica, and Maplesoft at ni.com/embedded. 


© 2006 National Instruments Corporation. All rights reserved. CompactRIO, LabVIEW, National Instruments, NI, and ni.com are trademarks of 
National Instruments. Other product and company names listed are trademarks or trade names of their respective companies. 7225-821-101 
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Event 


Calendar 


07/08-12/06 
AIAA/ASME/SAE/ASEE 
Joint Propulsion Conf. 
Cincinnati, OH 
www.aiaa.org 


07/11-13/06 
Microsoft Worldwide 
Partner Conference 
Boston, MA 
www.microsoft.com 


07/17-23/06 
Farnborough Int’! 
Airshow 2006 
Farnborough, UK 
www.farnborough.com 


07/24-28/06 

Design Automation 
Conference — DAC 

San Francisco, CA 
www.dac.com 


08/17-18/06 
Embedded Systems 
Conference Taiwan 
Taipei, Taiwan 
www.esconline.com/asia 


08/22/06 
Real-Time & Embedded 
Computing Conference 
Detroit, MI 
www.rtecc.com/detroit 


08/24/06 
Real-Time & Embedded 
Computing Conference 
Toronto, ON 
www.rtecc.com/toronto 


09/12/06 

Real-Time & Embedded 
Computing Conference 
Calgary, AB 
www.rtecc.com/calgary 


09/15/06 

Real-Time & Embedded 
Computing Conference 
Vancouver, BC 
www.rtecc.com/vancouver 


If your company produces 
any type of industry event, 
you can get your event listed 


by contacting sallyo@rtcgroup.com. 
This is a FREE industry-wide listing. 
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(STA), 


association established to support ; 


and promote SCSI technology, and safety, security and convenience — 


_ while saving money on utilities. : 
Efforts are currently under- | 
JOL), announced that the sixth Way to integrate functionality of — 


- Serial Attached SCSI (SAS) plugfest _ both technologies with the drafting | 


was held April 24-28. The tests of interoperability proposals for — 


_ emphasized large system builds and — the we protocols. The BACnet 
the performance of fabrics during 24 ZigBee teams will hold joint — 
attempts to disrupt the data flow, ™eetings in the coming months and — 
: Fifteen STA members attended. 
The highlight of the plugfest 


: was the array of large system builds : 


AdvancedIO 10 


configurable I/O products offer rich 
connectivity and packet processing 
_ performance in conjunction with 
application flexibility. The architecture 
_ provides the optimal platform for 
- implementing advanced Layer 2 and 
Layer 3 packet switching solutions at 
_ 10 Gbit/s line speed. 


SCSI Trade Association 
_ Holds Sixth Plugfest 


The SCSI Trade Association “"~" ng | 
_ building control systems by using | 


ZigBee technology to increase 


a member-run industry 


_ the University of New Hampshire — 
_ InterOperability Laboratory (UNH- | 


: configured as the largest, most : 


complex fabric to date. The system _ This 


HBAs, 


_ (MUX), provided by the attending — 


_ vendors. All of the large builds _ ZigBee 


GbE 


_ interoperability 


will 


open architecture with custom finalizing and standardization of the 


I/O and targeted (pre-programmed specifications, testing, and finally the 2 
for common I/O) recording and anticipated introduction of complete 
playback products for standard buses : SAS storage systems last fall. Testing | 
_ including VME, CompactPCI and | 
_ PC. Vortex recorders have a storage 
, system that takes advantage of the 7 introductions has virtually eliminated | 
_ power, flexibility and scalability | interoperability and compatibility 
_ of a Fibre Channel-based Storage problems. An additional plugfest is — 
Area Network (SAN). Vortex also 2 
_ includes SAN Access Kit workstation — 
- connectivity that allows direct high- 
_ speed access to the digital data from 
almost any commercially available 


_ processing platform. 


planned for fourth quarter. 


_ ZigBee and BACnet Link 

_ Up to Extend Standards to 
_ Wireless Building Control 

Alliance, a_ 
_ global ecosystem of companies 
creating wireless solutions for use 
_ in home, commercial and industrial | 
: applications, has announced a new 
_ collaboration with BACnet, a lead- 
_ ing protocol for wired commercial 
building automation, establishing | 
between the | 
two technologies. This agreement | 
allow building operators 
2 relying on existing wired BACnet 
infrastructure to confidently add | 
wireless devices to their existing | 


The ZigBee 


: beginimplementing thecollaborative 
_ plans of this partnership. 


BACnet currently 


cooperation _ will 


expanders, i be quickly upgraded to use ZigBee 


devices on the Building Control | 


Commercial 


_ messages to be encapsulated into 
ZigBee wireless 
This will allow for both BACnet | 


data packets. 


_ BACnet’s underlying object model. 


new 






Remote Device 


_ Management for Rapid 
- Post-Deployment Diagnosis 


_.© and Repair 
for SAS component interoperability | 


during development and early product - introduce its Management Suite, a 


Wind River Systems will soon 


| new solution for managing deployed 
| devices that will incorporate remote 
_ diagnostics services built around 
a scalable 
_ platform. Expanding on its device 
software optimization (DSO) strategy 
to incorporate 
device management, the management 


device management 


post-development 


suite will include the new Wind 
River Field Diagnostics, a remote 
diagnostics system and an upgraded 
version of Wind River Workbench 
Diagnostics, a root cause analysis 
system with workgroup collaboration 
facilities. With the management 
suite, device manufacturers will be 
able to remotely monitor, diagnose 
and repair deployed devices. 

Wind River Field Diagnostics 
is a distributed, remote diagnostics 
system that securely collects and 
manages operational information 
from thousands of devices deployed 
at customer sites. It is built on a 
standard, secure tiered-architecture 


- that leverages relational databases, 


J2EE application servers and Web 
services technology for enterprise- 
class scalability. Wind River Field 
Diagnostics also helps reduce call 
times and service costs, enables more 
competitive and auditable Service 
Level Agreements (SLA) and allows 
more productive and consistent 


_ global service operations. 
, supports — 
of which there were several), — five wired data links, with ZigBee | 
: _ becoming its first wireless link. | 
allow | 


builds contained many servers existing wired BACnet devices to 


- enclosures, 
analyzers, hard disk drives, tape : 
_ drives and an active multiplexer | "@twork. BACnet data types and — 
commands will be added to the | 
Building — 


were stable configurations and profile, which will allow BACnet — 


_ were successfully tested by hot- 
_ plugging and swapping drives. 
_ These tests verified that broadcast | 


_ change notifications did not disrupt 24 ZigBee gateways and intrinsic 


| the fabric, which would cause the Z!8Bee devices to preserve much of © 
system to stop transferring data. 
2 SAS market entry is the result of — 
_ four years of preparation; including | 


_ development of the SAS technology, — 


Wind River’s Workbench 
Diagnostics is an Eclipse-based 
analysis system and _ allows 
design engineers to dynamically 
instrument device software in 
order to rapidly isolate, diagnose 
and correct software defects in 
running systems. As part of the 
Wind River Management Suite, 
the newly enhanced Workbench 
Diagnostics is tightly integrated 
with Field Diagnostics to offer 
self-management features, 
support for additional processor 
architectures and a new workgroup 
configuration. The new workgroup 


features help developers collaborate 
_ to create and manage the libraries 
of diagnostic 
patches and procedures used to 
_ monitor and test multiple devices. 


instrumentation, 
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Mezzanine Soup to Nuts 


XMCs Bring Digital 





Video Standards to High- 


Performance Applications 


As high-performance subsystems must increasingly drive high-resolution 
video displays, XMC and serial digital video open standards are providing 
cost-effective solutions. 


by Stephane Joanisse, Curtiss-Wright 


Controls Embedded Computing 


igh-performance embedded  sub- 
IK systems are increasingly required 
to drive high-resolution video dis- 
plays. Unfortunately, many existing sub- 
systems, such as those used in aerospace 


and defense, lack video controllers that 
provide the requisite digital interface to 


Approximate 
Year of 
Introduction/ 
Definition 


(1993) (1997) (1997) (1998) 


PCI AGP AGP AGP 


MNteMTACE TYPE) 35/33 1x 2x Ax 


Maximum 
Data Rates 132 264 528 
(MBytes/s) 


GPUs on PMC 
Max Data 264 264 264 264 
Rates 


GPUs on XMC 
Min Data Rates 


GPU on PMC 
(note 1) 


GPU on XMC 
(note2) 


1056 


264 


(2002) 


support the high-speed, high-bandwidth 
digital signals needed to drive these dis- 
plays. 

The migration of high-end digital 
video standards from commercial appli- 
cations into the aerospace and military 
markets has been occurring for some time 


(2004) (2004) (2004) (2004) (2004) 


AGP PCle PCle PCle PCle PCle 


8X 1x 2X 4x 8X 16x 
2112 250 500 1000 2000 4000 
264 
1000 1000 1000 1000 1000 
1000 2000 


Note 1: PCI at 32bit/66MHz is the most common for GPUs on PMC and is equivalent to AGP 1x. 
Note 2: Proposed XMC will most likely use either 4 lanes or 8 lanes. 


Evolution of graphics interface performance over time. 
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in naval systems. These are better able 
to rapidly adopt commercial components 
because they usually do not require the 
high level of ruggedization demanded by 
ground vehicle and aerospace platforms. 
Naval environments are also typically free 
from the space, weight, heat, shock and vi- 
bration limitations found in vetronics and 
avionics platforms, which helps ease the 
adoption of commercial technologies. 

As a result, it is in naval systems that 
digital video has found the most rapid ac- 
ceptance. But, as digital video becomes 
more ubiquitous, it is making inroads into 
harsh, demanding environment applica- 
tions as well. 


Commercial Digital Video 
Interface Standards 

Recent years have witnessed an evo- 
lution in commercial video standards 
and interfaces (Figure 1). Digital video 
interfaces were initially used to provide 
the interface between a notebook and its 
display panel. Since these elements were 
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physically close to each other, there were 
few concerns regarding drive length or in- 
terface size. 

These early interfaces primarily used 
low voltage differential signaling (LVDS) 
and various protocols, such as FlatLink 
and OpenLDI. They typically supported 
several pixel depths and either four or five 
differential pairs were used. LVDS in- 
terfaces were subsequently employed be- 
tween a PC and an external monitor where 
dual-link interfaces were used to support 
the need for higher resolutions. 

Since then, there has been a shift 
away from LVDS toward the Digital Vi- 
sual Interface (DVI) standard that is now 
common on PCs and most commercial 
electronics, such as TVs. DVI can drive 
higher resolutions and larger displays than 
is possible with LVDS. The DVI standard 
itself has evolved into the High Definition 
Multimedia Interface (HDMI) standard 
to support both the audio and video com- 
ponents for high-definition TV (HDTV). 
HDMI utilizes smaller connectors than 
DVI while maintaining DVI electrical 
compatibility. 


High-Speed Differential 
Signaling and PMCs 

The next step beyond DVI is the use 
of digital interface standards, such as that 
developed by the Society of Motion Pic- 
ture and Television Engineers (SMPTE), 
SMPTE-292M. This standard defines 
the method for transmission of HDTV 
formats, both interlaced and progressive, 
over a high-speed serial interface. It is 
now commonly used in television broad- 
cast centers. 

In comparison, while a single-link 
LVDS interface features five differential 
pairs, DVI has four pairs and SMPTE- 
292 uses only a single differential pair. 
The SMPTE-292M standard defines both 
electrical and fiber optic transmission of 
the video data using two differential sig- 
nal lines transmitted over coax or fiber. 
The positive signal uses one of the two 
transmission lines, while the negative sig- 
nal uses the other. 

The key challenge with high-speed 
differential signaling is maintaining ade- 
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PMC vs. XMC in Graphics Applications 


Format PMC XMC 
IEEE 1386 => Common 
Standard (physical Mezzanine Card 


form-factor) 


VITA 20 => Conduction 
Cooled PMC 


[EEE 1386.1 => Physical 


VITA 42.0 => XMC 


VITA 42.1 => XMC Parallel 
RapidlO Protocol Layer 
42.2 => XMC Serial RapidlO 





Advantage 


Standard 
(electrical and Environmental | Protocol Layer 
interconnect) Layers for PCI Mezzanine 42.3 => XMC PCI Express 
Cards: PMC Protocol Layer Standard 
42.4 => HyperTransport 
Protocol Layer 
64 when only PN4 is populated, 
114 when only P16 is populated, 
178 when both PN4 and P16 are 
populated. 
. Actual |/0 pin count will be Up to 278% more |/0 
eee lower since differential signaling with XMC 
standards often require the use 
of additional grounds in order 
to maintain appropriate signal 
integrity. 
264 
Typical I/F Speed § Since most GPUs used 
(Mbytes/s) on today’s PMCs are 2000, if using a x8 interface, Roughly 750% greater 
[theoretical AGP-enabled they 1000, if using a x4 interface bandwidth 
maximum] therefore have a 32-bit, 
66 MHz interface. 
GPUs are a few GPUs can be “at par” with 
GPU Selection generations older, current commercial offerings, 


support less graphics 
memory. 


and support more graphics 
memory. 


Although graphics processing units (GPUs) have improved the 
performance of graphics mezzanine cards, XMC brings much faster I/O 


performance compared to PMCs. 


quate signal integrity to achieve the upper 
ends of the performance envelope. Unfor- 
tunately, today’s de facto standard embed- 
ded mezzanine module, the PMC card, is 
unsuitable for maintaining signal integrity 
in the conditions that must be withstood 
by systems used in rugged vehicle and 
aerospace applications. These conditions 
include a wide range of temperatures, 
Shock and vibration, and otherwise de- 
manding environments. 

In addition to signal integrity issues, 
the large display formats supported by 
the newer digital video standards, such 
aS SMPTE-292M, mean increased data 
movement between the base card and 
mezzanine to support textures, video cap- 
ture and other video operations that may 


be required. With the PMC standard used 
today, graphics mezzanines are typically 
restricted to comparatively low-bandwidth 
PCI operation, restricting data path band- 
width and inhibiting support for higher 
resolutions. 


Expanding Bandwidth 

In response to the limitations of PMC, 
the VITA Standards Organization (VSO) 
has developed the VITA 42 Switched Mez- 
zanine Card (X MC) standard. X MC offers 
bandwidth greater than that of PMC’s na- 
tive PCI bus by providing support for up to 
16 lanes of PCI Express. Meanwhile, new 
host board form-factors, such as VITA 46 
(VPX) and VITA 48 (VPX-REDD), of- 
fer support for the new XMC modules. 
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XMC cards can be configured with up to six connectors, including the Pnd 
fabric connector and the Pn6 XMC connector for mezzanine I/O. 


In addition, their new high-bandwidth 
interconnects also provide significantly 
improved backplane bandwidth to support 
higher-speed distribution of the graphics 
data. The adoption of XMC modules and 
new host board standards promises to be a 
contributing factor to the success of inte- 
grating the new commercial digital video 
standards into high-performance embed- 
ded systems (Figure 2). 

Driving the need for the higher-speed 
and higher-resolution digital video inter- 
face is an increase in the amount of incom- 
ing digital video data and the use of larger 
high-resolution displays. The increased 
data results from a proliferation of sen- 
sors in the field, resulting in a profusion of 
real-time data that threatens to overwhelm 
designers of man-machine interfaces and 
command/control consoles. 

Today, it is common for an operator 
to be seated in front of a large display that 
is fed real-time information from mul- 
tiple sensors. With enough resolution and 
bandwidth, data from these multiple sen- 
sors can be displayed simultaneously, as 
well as side by side if need be. To keep 
up with this profusion of data, system de- 
signers are increasingly looking to state- 
of-the-art commercial digital video and 
serial digital graphics I/O. 

Another driver behind the growing 
need for higher-speed graphics interfaces 
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is the growth of the networked battlefield. 
Data needs to be shared across the network 
in real time, but the sensors generating the 
data may not be directly tied to where the 
data needs to be used. The advent of se- 
rial switched fabric technologies such as 
Advanced Switching Interconnect, Se- 
rial Rapid IO and PCI Express supports 
distributed networking on the subsystem 
level and makes it considerably easier to 
get data from one end of the subsystem to 
the other. Using serial switched fabrics, it 
is now possible to get all of the sensor data 
into the subsystem, do something useful 
with it and then drive it out to a suitable 
display. 


XMCs in High-Resolution 
Digital Video Design 

XMCs are finding their way onto 
VME64x designs as well as new VPX/ 
VPX-REDI designs, such as Curtiss- 
Wright’s XMC-ready VPX6-185 SBC and 
the CHAMP AV-6 DSP engine. The XMC 
form-factor is backward-compatible with 
existing PMC sites. It can support the full 
complement of four PMC connectors de- 
fined by the PMC specification as well as 
the additional two high-speed connectors 
defined by the XMC standard. 

An XMC module is able to support 
anywhere from one (Pn5) to six (Pn1-6) 
connectors, depending upon its I/O and in- 


terface requirements. Today, many XMCs 
have the fabric connector (Pn5) but use the 
existing PMC Pn4 connector for the I/O, 
since this already has defined mappings to 
the base card backplane. Since the XMC 
standard provides backward compatibility 
for legacy PMC cards, as well as the two 
new XMC connectors, system designers 
can use PMCs for functionality that does 
not require the speed and bandwidth ad- 
dressed by XMCs. In addition, with both 
the PMC Pn4 connector and the XMC 
Pn6 connector available for mezzanine I/ 
O, the amount of I/O from the mezzanine 
has dramatically increased (Figure 3). 

With its support for PCI Express, 
the XMC module, in its VITA 42.3 ver- 
sion, promises to be a popular mezzanine 
form-factor for deploying high-resolution 
digital video modules in demanding envi- 
ronments. Because an XMC can support 
anywhere from | to 16 lanes of PCI Ex- 
press, an increase in bandwidth of up to 
16x between the base card and the mez- 
zanine is possible, compared to existing 
PMC PCI interfaces. 

The advent of XMC is a welcome 1m- 
provement for graphics mezzanine cards. 
Graphics processing units (GPUs) have 
made great strides in performance, with, 
for example, Accelerated Graphic Port 
(AGP)-based GPUs advancing from 1x 
to 2x and most recently up to 8x versions. 
However, PMC cards typically operate at 
PCI 32-bit/66 MHz. 

XMC provides a much needed re- 
sponse to this near stasis in graphics mez- 
zanine I/O performance by allowing the 
base card to interface to the native GPU 
interface without bridging, and hence pro- 
viding higher performance levels that are 
much closer to those of high-end commer- 
cial systems. Today, high-performance 
levels of graphics performance for embed- 
ded COTS systems are only obtainable 
through custom solutions. XMC and serial 
digital video promise to enable cost-effec- 
tive solutions through open standards. @ 
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Mezzanine Soup to Nuts 


FPGAS Revolutionize 
Mezzanine Boards 


FPGAs are changing the architectures of mezzanine cards and extending 
their functions. Using them to implement real-time signal processing, 
high-level local control functions and high-speed interfaces is 


revolutionizing both mezzanine and system design. 


by Rodger H. Hosking 
Pentek 


also called daughter cards, have 

proven to be an essential and highly 
effective strategy for configuring embed- 
ded systems to meet the specific needs 
of a wide range of applications. System 
designers regularly exploit their modular- 
ity by using them to add various types of 
interfaces to processor boards, thus creat- 
ing custom systems from standard COTS 
products. In the past, mezzanine boards 
consisted mainly of specialized connec- 
tors, driver/receiver circuits, modems, 
UARTS and ASICs dedicated to a par- 
ticular interface. Board programmability 
was limited to a few fixed, pre-defined 
functions. 

In recent years, FPGAs have dra- 
matically changed the architectures and 
extended the functions of mezzanine 
boards in many different ways. Not only 
can FPGAs be configured to implement 
numerous electrical interface standards, 
they can also implement a variety of pro- 
tocol engines. In this way, one FPGA- 
based product can replace several legacy 
products. Through reconfiguration, that 
FPGA-based product can also be adapted 
to new standards and protocols to help 
safeguard against product obsolescence, 
at the level of both board and deployed 
system. 


- or many decades, mezzanine boards, 


Although these are laudable gains, 
the real benefits FPGAs bring to mezza- 
nine boards stem from their ability to im- 
plement real-time signal processing, high- 
level local control functions and high- 
speed interfaces. By doing so, FPGAs 
have revolutionized both mezzanines and 
embedded system design. 


Advanced Digital Signal 
Processing Functions 

Data rates at the front end of the 
mezzanine board are often quite high, 
especially for new network and storage 
interface standards with multiple chan- 
nels and gigabit signaling rates. For these 
standardized interfaces, such as between 
a host processor and a communication 
channel like Ethernet, an ASIC is usually 
the best solution for handling the neces- 
sary protocol tasks. 

However, acquisition of wideband 
analog signals for radar, satcom and com- 
munication systems requires A/D con- 
verters operating at sampling rates of 100 
MHz and often much higher. Because of 
the variety of signal types and frequency 
characteristics, signal processing tasks 
tend to be quite unique for each system. 
As a result, there is no standard ASIC 
available that can handle a wide range of 
applications. 


This forces the system designer to 
find the best way to process the tremen- 
dous amount of data generated by these 
data converters. If that data cannot be 
handled on the mezzanine card, the task 
of processing it must fall squarely on the 
shoulders of the CPU or DSP board that 
hosts the mezzanine card. But performing 
these data extraction and protocol process- 
ing tasks at the interface data rates often 
consumes a major portion of a processor’s 
horsepower, thereby increasing the num- 
ber of processors and causing an immedi- 
ate impact on system size and cost. 

Fortunately, FPGAs provide a nearly 
ideal solution to this dilemma. Consis- 
tent with advances in silicon technology, 
each new generation of FPGA devices 
delivers faster speeds, improved density, 
larger memory resources and more flex- 
ible interfaces. One major watershed 
event for FPGAs was the recent incorpo- 
ration of hardware multipliers. Because 
multiplication is an essential operation 
in nearly every digital signal processing 
algorithm, hardware multipliers have af- 
forded FPGAs a strategic entry into DSP 
applications. 
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An FPGA-based 256-channel narrowband digital down-converter exploits parallelism of DSP resources. FPGAs can 
be used to implement advanced signal processing functions at the front end of the mezzanine board. 


Some of the most popular func- 
tions of mezzanine boards include digital 
downconversion for narrowband commu- 
nication systems, FFT processing for ra- 
dar applications and coding/decoding for 
broadband wireless networks. These tasks 
are tailor-made for FPGAs: they are well- 
defined, processing-intensive tasks that 
can take advantage of the parallel struc- 
tures within the FPGA to achieve real- 
time processing at rates matching those of 
high-speed data converters. 

Unlike general-purpose processors 
that must perform multiplications seri- 
ally, FPGAs can be configured to execute 
hundreds of multiplications in parallel. 
This makes them highly complementary 


A/D 


Flash 





companions to programmable CPUs and 
DSPs, which are far better at handling 
more complex high-level tasks executing 
through a C program. 

Fortunately, because of their ability to 
handle diverse logical and electrical inter- 
face signals, the natural place for FPGAs 
on mezzanine cards is between the high- 
speed converters and the mezzanine bus. 
This is also precisely the best point for 
front-end DSP processing, further fueling 
the adoption and acceptance of FPGAs 
for performing signal processing on mez- 
zanines. 

For example, a 256-channel narrow- 
band digital down-converter [P core can 
fit within a single Virtex-II Pro FPGA 


Baseband 
Signals 


Frequency 
Information 


Power PC 
Controller 


RAM 


Replacing multiple system boards, a single FPGA-based mezzanine module 
functions as a tracking receiver that automatically tunes frequency-agile 


signals. 
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(Figure 1). When these cores are installed 
on FPGA-based PCI mezzanine card 
(PMC) modules, the modules can achieve 
a tremendous channel density with all fre- 
quency translation and baseband filtering 
performed ahead of the host processor. 
In this example, not only does the FPGA 
offload the host CPU, it also replaces 
64 four-channel digital down-converter 
ASIC chips, which would not have fit on 
the PMC. 


Complex Local Control 
Functions 

As mezzanine boards acquire higher- 
level functions by virtue of newly ac- 
quired DSP capabilities, the need to con- 
trol those functions in real time becomes 
more critical. For example, a scanning 
communications receiver may need to 
monitor the airwaves for unknown sig- 
nal frequencies and tune those signals 
for acquisition and storage. If the host 
processor is burdened with this task, re- 
action time may be compromised by the 
Operating system or execution of other 
tasks. Therefore, for tough real-time con- 
straints on mezzanine boards it is often 
necessary to implement the control func- 
tions on the mezzanine itself. 

Two traditional control solutions af- 
forded by programmable logic are state 








machines and simple combinatorial logic. 
These can be highly effective for simpler 
functions but can be difficult to develop as 
control algorithm complexity increases. 

To address these situations, a com- 
pletely new type of FPGA resource is 
now available. One or two PowerPC RISC 
controllers are embedded within Virtex-II 
Pro FPGA family devices. Provision for 
external flash memory allows the control- 
ler to boot a control program into that 
memory and execute using either internal 
or external RAM. A software tool suite 
includes libraries, compilers, assemblers 
and debuggers to simplify development of 
complex programs written in C. 

For example, a complete tracking re- 
ceiver uses an embedded PowerPC RISC 
controller (Figure 2). Samples from the A/ 
D converters are processed with an FFT 
IP core, delivering frequency bin output 
energy levels to the PowerPC for analysis. 
The signal frequency deemed worthy of 
tracking is identified and the appropriate 
tuning commands are sent to the digital 
down-converter IP core to down convert 
that signal. If the frequency of the signal 
periodically shifts to thwart reception, the 
detection algorithm adjusts the tuning fre- 
quency to implement an adaptive tracking 
function. The baseband output signal and 
the latest tuning information are delivered 
across the mezzanine interface to the host 
processor. 
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J15 Onl 

VITA ee J 
Description Type Lanes x 

Std 

Clock 
42.1 Parallel RapidlO Par 8 X 500 MHz 
42.2 Serial RapidlO Ser 8 X 3.125 GHz 
42.3 PCI Express Ser 8 X 2.5 GHz 
42.4 HyperTransport Par 8 x 800 MHz 
42.5 Aurora Ser 8 X 3.125 GHz 
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Transfer J15 and J16 Transfer 
Rate Lanes x Clock Rate 

1 GB/sec 16 x 500 MHz 2.0 GB/sec 
2.0GB/sec 16x 3.125 GHz 5.0 GB/sec 
2.0GB/sec 16x 2.5 GHz 4.0 GB/sec 
1.6GB/sec 16x 800 MHz 3.2 GB/sec 
2.0GB/sec 16x 3.125 GHz 5.0 GB/sec 


VITA 42 XMC sub-specifications define specific implementations for widely 
used fabrics. Popular serial clock rates and the resulting transfer rates in 
each direction are shown for either one (J15) or two (J15 and J16) XMC 


connectors. 


In this example, a single mezzanine 
module assumes the role of an entire 
tracking system that previously might 
have required several system boards for 
implementation. 


New Gigabit Serial Interfaces 

The data rate demands of high-speed 
networks, advanced RISC and DSP pro- 
cessors, A/D and D/A converters, video 
devices, storage devices, communica- 
tion channels and other peripherals have 
swamped the traditional backplanes and 
bus architectures that have served embed- 
ded systems admirably for more than 30 
years. Coming to the rescue are new giga- 
bit serial links and several protocol stan- 
dards that move data at rates of at least 10 
times faster. 
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Methods of adapting these new 
links to mezzanine boards for embed- 
ded systems are spelled out in the VITA 
42 standard, also known as XMC. As an 
extension to the widely used PMC, XMC 
defines two new connectors that join the 
mezzanine board to the host or carrier 
board. Each connector provides up to 
20 differential gigabit signal paths, 10 in 
each direction. One popular implementa- 
tion calls for two 4x full-duplex links per 
connector. With today’s serial bit rates of 
3.125 GHz, this translates into 2.5 Gbytes/ 
Ss in each direction for each connector. 
This is a big increase from the few hun- 
dred megabytes per second common for 
PCI bus transfers. 

Once again, FPGAs are the primary 
enabling technology for these new links. 
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XMC Switched 
Fabric Connector 


In Pentek’s Model 7140 dual-channel software radio transceiver XMC mezzanine module, an FPGA acts as the interface 
to data converter, ASIC, memory and interface resources. 
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Recent FPGA offerings from Xilinx fea- 
ture gigabit transceivers called Rock- 
etIO, while Altera’s counterparts are the 
Stratix-GX Multi-Gigabit Transceivers 
(MGTs). Behind these physical interface 
drivers are channel encoders and decod- 
ers that perform serial/parallel conversion 
so that data and clock are combined in the 
signaling on each differential pair over the 
external serial channel. Both receive and 
transmit circuitry is built into the FPGA 
and is commonly referred to as serializer/ 
deserializer (SERDES). 

A protocol engine within the FPGA 
interfaces with the SERDES to correctly 
process packets, header information, con- 
trol functions, error detection and correc- 
tion and payload data format. Since each 
switched serial fabric standard has its own 
protocols and rules, FPGAs offer excellent 
flexibility by allowing users to install the 
appropriate IP core protocol engine. The 
strategy makes FPGA-based XMC mod- 
ules truly fabric-agnostic and allows one 
hardware design to be deployed in several 
different fabric environments. Various 
sub-specifications for VITA 42 define the 


implementation of popular switched fab- 
ric standards (Figure 3). 

The Xilinx Aurora protocol, defined 
for use in XMC as VITA 42.5, is a light- 
weight link-layer protocol that does not 
implement the packet routing and switch- 
ing features of the other more complete 
protocols. However, it has many advan- 
tages: the IP core is much smaller, it is 
available free from Xilinx, there are no 
associated runtime licensing fees and the 
reduced packet overhead makes data pay- 
load transmission more efficient. Since 
many mezzanine applications can be sat- 
isfied with this type of dedicated point- 
to-point connection, Aurora becomes a 
very attractive option. A similar protocol, 
called SeriaLite, is offered by Altera. 


Putting It All Together 

Pentek’s Model 7140 dual-channel, 
software radio transceiver XMC mez- 
zanine module illustrates how this new 
FPGA technology has been deployed (Fig- 
ure 4). In this mainstream COTS product, 
a Virtex-II Pro FPGA acts as the interface 
to all data converter, ASIC, memory and 


interface resources on the module. With 
232 hardware multipliers, this FPGA eas- 
ily accommodates the 256-channel digital 
down-converter IP core, for example. The 
two PowerPC processors with attached 
SDRAM and flash memories support the 
tracking receiver application. The Rock- 
etIO interface implements the direct con- 
nection to the XMC connector for the dual 
Ax serial fabric ports with a combined data 
rate capacity of 2.5 Gbytes/s. 

Since all of these FPGA features are 
now standard resources found on most 
later-generation devices, new mezzanine 
board designs can easily take advantage 
of them without incurring additional cost. 
This ensures the rapid adoption of these 
resources for embedded systems being 
developed today and also helps establish 
standards and methodology for future sys- 
tem architectures. @ 
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Innovation, Commitment, Integrity, and Economic Value... These are qualities you'll 
often find at leading companies in today’s global economy. They're also qualities you'll find 
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Mezzanine Soup to Nuts 


ISCSI Brings Distributed 
Storage to Mezzanine Clients 


Traditional storage architectures for embedded, high-reliability 
applications require physically separate, application-specific sub- 
networks. iSCSI offers freedom from direct attachment to storage devices, 
opening distributed systems to a much wider variety of architectures. 


by Stan McClellan 
SBE 


s mobile devices become more 
fA vores continuous network con- 
nectivity is increasingly prevalent 
in military, enterprise and consumer 
environments. Consequently, access to 


mass storage by large numbers of highly 
distributed clients will lead the way to a 
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new class of applications where lower 
cost, ease of management, expandability 
and unlimited connectivity will be key 
performance requirements. 

The rise of mezzanine-led archi- 
tectures such as ATCA and MicrofCA 
presents an excellent opportunity for 


Local bus access 
(SCSI, IDE/ATA, 
SAS, SATA, etc.) 
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Storage Device 





Local bus access (SCSI, 
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iSCSI and other storage architectures differ in several ways. With iSCSI, 
the file server or application server component is actually an iSCSI target 
(Server) embedded in a gateway device or in the storage device itself. 
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Internet Protocol (IP)-based storage. 
Ethernet is the dominant backplane ar- 
chitecture for the current generation of 
ATCA deployments, which tend to cen- 
ter on control-plane applications. The In- 
ternet Small Computer System Interface 
(SCSI) is a good match for [P-over-Eth- 
ernet from blade or mezzanine module 
clients reaching across the backplane to 
in-chassis, blade-based storage, as well 
as out-of-chassis network-based storage. 
Additionally, embedded iSCSI target 
implementations that “virtualize” exter- 
nal storage for intra-chassis use may be 
deployed as mezzanine-based storage 
adapters for bridging to legacy storage 
deployments. 


iSCSI in Distributed Storage 
Applications 

High-end embedded applications 
can benefit from using iSCSI for linking 
data storage systems. By carrying SCSI 
commands over IP networks, iSCSI helps 
facilitate data transfers over intranets 
and manage storage over long distances. 
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In the IP-based Multimedia Subsystem of next-generation core telephony 


networks, 3GPP IP Multimedia Subnetwork (IMS), network elements such 
as the media resource function (MRF) and media gateway (MGW) access 


large amounts of storage. 


Embedded devices can access “virtually 
local” storage via arbitrary network con- 
nections, without expensive or specialized 
host adapter hardware. 

iSCSI is designed to transport block 
I/O via an IP network rather than files 
from a filesystem. The client-side handler 
for the iSCSI protocol is called an initia- 
tor, while the server-side handler is called 
a target. Session management and other 
communication between client and server 
endpoints are handled via well-known 
SCSI command sets, which are carried 
across the intervening TCP/IP network. 


between iSCSI initiators and targets are 
a number of important considerations. 
These issues arise directly from the ses- 
sion-oriented nature of the protocol sets, 
the general unreliability and insecurity of 
IP networks and the application scenarios 
that result from this powerful, new dis- 
tributed storage paradigm. 


iSCSI vs. Other Storage 
Architectures 

iSCSI differs from other storage ar- 
chitectures in several ways (Figure 1). 
In direct-attached storage (DAS), a stor- 


Embedded within the interactions age device is directly attached to a host 
ERL Term Error Conditions Response to Error Recovery Mechanisms 
Terminate the session: Re-start a new session: 
Baecine - Close TCP connections -Open new TCP 
0 Failure Any kind of error - Complete pending SCSI connections 
commands with an error -Re-issue failed SCSI 
status commends 
Sequence errors, 
Receiver selectively 
timeouts, or errors in Packets with errors can be 
1 Digest Failure ae acknowledges errored 
32-bit iSCSI checksum re-transmitted 
packets (SNACK) 


on packets 


Transfer “allegiance” 


May require connections 


Connection TCP connection failures of thread for the SCSI to be rebuilt and iSCSI 
2 
Failure or quivalent command/data context to a packets to be re- 
different TCP connection transmitted 


node via a local bus. Embedded systems 
often use DAS to store OS images and 
local data, while using SCSI or Serial At- 
tached SCSI (SAS) storage protocols to 
access the data. 

Network-attached storage (NAS) re- 
fers to a host that provides network-based 
access to some local storage, possibly 
DAS, via a client/server paradigm such 
as Network File System (NFS). Typically, 
NAS-based approaches lack scalability 
and experience serious performance bot- 
tlenecks due to the layering of filesystem 
and application-level structures on top of 
the storage transport layer. 

A storage area network (SAN) con- 
sists of dedicated storage devices on a 
private storage-optimized network, or 
fabric, which share access to media via an 
arbitration scheme. Fibre Channel (FC) is 
arguably the most common form of SAN. 
Like FC, iSCSI uses the SCSI storage pro- 
tocol to access media. Unlike FC, iSCSI 
is agnostic of the network transport layer 
and has no distance limitations. 


Embedding iSCSI in Different 
Application Scenarios 

Most application scenarios for iSCSI 
deployment focus on simplified access to 
aggregated storage. The initiator/target 
architecture of the protocol lends itself to 
a form of highly scalable disk virtualiza- 
tion. Highly portable iSCSI software so- 
lutions allow system designers to easily 
embed this storage architecture into any 
hardware device—ASIC, mezzanine card, 
blade or server—running under any OS. 

Many functions in telephony net- 
works require access to storage. These 


Significance 


Simplest form of response to errors in the session. All 
recovery defaults to usual SCSI command set error 
condtions and responses. Required in any compliant 
implementation. 


Most common form of response to errors. Adds 
application-specific enhancement to TCP’s per-packet 
mechanisms, but may be computationally expensive. 


Most robust form of response to errors. Upper-level SCSI 
driver remains unaware that failed commands (due to 
network errors) have been retried or moved to alternate 
TCP connections. 


iSCSI error recovery levels are based on how much the native SCSI command set is shielded from awareness of 


network errors. 
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requirements are typically codified in 
industry standards such as the IP Multi- 
media Subsystem of next-generation core 
telephony networks: 3GPP IP Multime- 
dia Subnetwork (IMS) (Figure 2). Here, 
user devices access pre-stored multime- 
dia data, including on-demand audio and 
video streams. Network elements, such 
as the media resource function (MRF) 
and media gateway (MGW), must access 
large amounts of storage to feed media 
streams to client devices, source media 
streams for re-encoding and delivery, 
process audio streams for voice recog- 
nition or pattern matching and evaluate 
subscriber profiles or record session sta- 
tistics. iSCSI provides a highly scalable, 
cost-effective interface between these 
network elements and virtual disks that 
contain media data, legacy subscriber 
data or accounting information. 

As acost-effective enabling technol- 
ogy, an iSCSI initiator embedded in the 
OS of mobile devices provides interest- 
ing possibilities for distributed system 
architectures. 

Military and distributed signal in- 
telligence applications typically involve 
read/write access to massive quantities of 
data, including multi-spectral imagery, 
elevation profiles, template data for tar- 
get recognition or other application-spe- 
cific files for pre- or post-processing. In 
these cases, iSCSI simplifies system ad- 
ministration for client devices since the 
location, description and capabilities of 
the virtualized disk are centralized and 
completely hidden from the local config- 
uration on the distributed endpoints. 

The session management inherent 
in the iSCSI protocol and network trans- 
ports provides a wide variety of fault- 
tolerant mechanisms for mission-criti- 
cal applications. These characteristics 
make iSCSI a good match for environ- 
ments where network topology changes 
rapidly—such as mobile, peer-to-peer 
or mesh networks—or for environments 
where large amounts of data must be 
centrally stored but accessed by distrib- 
uted clients with limited local facilities. 

iSCSI can be used here for high-reso- 
lution video monitoring for building secu- 
rity and remote, unattended cameras used 
for automobile traffic management. It 
may also be useful as a replacement stor- 
age technology for the analog videocas- 





iSCSI software is used to embed this storage architecture into any 


hardware device running under any OS. SBE’s iSCSI software provides 
maximum redundancy and reliability, and enables location-independent 
data storage and retrieval via a user-friendly, comprehensive GUI. 


sette-based dashboard cameras often used 
in law enforcement or for portable personal 
entertainment and gaming systems. 

Incorporating legacy storage devices 
into an iSCSI paradigm is a matter of 
front-ending the legacy systems with an 
iSCSI target or target portal. This archi- 
tecture can be used to simplify and vir- 
tualize heterogeneous storage devices into 
a single manageable, scalable and distrib- 
uted disk farm with centralized, network- 
based control. 

A number of enterprise-class SAN 
routers are already available with an 
iSCSI upstream interface layered over 
multiple 1 Gbyte Ethernet links. On their 
downstream sides, these gateway devices 
use FC, SCSI, SAS, SATA or other DAS 
interfaces. 

Deployments that don’t require inter- 
networking with legacy storage systems 
can already take advantage of embedded 
iSCSI target implementations 1n array con- 
trollers. It is only a matter of time before 
generic target implementations appear in 
embeddable board-level subsystems as 
well as silicon systems-on-chip. 


Deployment Considerations 

When direct dependence on IP-based 
networking is involved, the usual issues 
related to network architecture, security 
and availability must be considered, not 
only in the machine-to-machine com- 
munication model but also in the highly 
distributed nature of the initiator/target 
architecture. 


One of the primary arguments 
against storage-over-IP architectures 
is that general-purpose networks and 
computers cannot properly handle the 
unique needs of storage traffic. The non- 
deterministic end-to-end behavior of IP 
networks and TCP/IP stacks can create 
serious problems when introduced into 
communication channels that need deter- 
ministic performance. 

This results in a strong need for 
implementing complete iSCSI stacks via 
embedded system paradigms. Dedicated, 
single-focus, storage-specific imple- 
mentations based on iSCSI will relieve 
the performance limitations inherent in 
multi-user systems with overworked net- 
work interfaces. 

Although scalability through distri- 
bution is the primary strength of IP-based 
network architectures, this benefit often 
comes at the expense of throughput and la- 
tency. Transporting storage traffic over IP 
networks requires some network engineer- 
ing to accommodate application-specific 
latency and throughput requirements. For- 
tunately, several technologies at the link 
layer and above can be effective in guar- 
anteeing certain types of service for iSCSI 
traffic. In some cases, physical segregation 
may be required, such as is anticipated by 
the multiple physical links of an ATCA 
Fabric Channel via PICMG 3.1, option 3. 

In either the local area or wide area 
case, the use of a parallel, physically 
segregated network path always results 
in better performance due to the higher 
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expense of duplicated facilities. How- 
ever, in either case the use of iSCSI as 
the storage transport protocol provides 
flexibility, which is not available via con- 
ventional SAN or DAS paradigms. 

With DAS and SAN architectures, 
traffic between hosts and storage devices 
is not typically at risk from external en- 
tities. With iSCSI storage, however, the 
intervening IP network becomes a sig- 
nificant security concern, since it may be 
accessible by potentially malicious users. 

A primary issue concerning iSCSI 
sessions and endpoints is weaknesses in 
authentication approaches. Many iSCSI 
implementations allow for deployment 
without any form of security measures. 
Although native iSCSI security mecha- 
nisms may prevent some forms of unau- 
thorized activity, they provide no signifi- 
cant protection for ongoing validation of 
session integrity. 

Consequently, two options exist for 
the secure transport of 1SCSI session in- 
teractions: physical isolation of the stor- 
age subnetworks, or transporting iSCSI 
packets via encrypted tunnels. The first 
involves redundant, application-specific 
networks, which negates much of a con- 
verged network infrastructure’s value. 
For encrypting storage traffic, the most 
likely candidate is the IP security frame- 
work (IPsec). 

Availability of network endpoints is 
based on two primary concepts: the abil- 
ity to detect when a command or other 
data has not been accurately delivered 
and the sequence of steps taken in reac- 
tion to a detected failure. 

Detection of failed communication 
is typically straightforward, particularly 
for packet-based networks. Approaches 
to detection tend to be limited to well- 
defined classes based on channel char- 
acteristics and related to such factors as 
acknowledgments, timeouts and data- 
corruption mechanisms. 

iSCSI differentiates error recovery 
tasks between initiator and target. The 
initiator maintains enough outstanding 
history so that it can re-issue garbled 
commands or re-transmit lost data. The 
target maintains history for unacknowl- 
edged data and status responses. 

In general, the distinction between 
iSCSI error recovery levels (ERLs) 
(Figure 3) is based on how much effort 


is expended to shield the native SCSI 
command set from awareness of net- 
work errors. The most basic case, ERLO, 
simply aborts all iSCSI activity at the 
first sign of a network error and lever- 
ages existing SCSI-level error recovery 
mechanisms. Higher error recovery lev- 
els, ERL1 and ERL2, attempt to respond 
to network-related errors using applica- 
tion-specific per-packet checksum and 
timeout mechanisms. In the most com- 
plex case, ERL2, which is supported by 
SBE’s iSCSI software (Figure 4), alter- 
nate TCP sessions are used to “silently” 
reconstruct command context within an 
existing session. 

In high-availability systems or 
scenarios where latency is critical, it is 
especially important to make the same 
content or function available via more 
than one access path, which is achiev- 
able via multipathing. 1SCSI multipa- 
thing can be particularly useful for de- 
ployment environments where network 
connectivity between client and server is 
not always guaranteed or does not have 
deterministic latencies, such as wireless. 
When used in conjunction with ERL fa- 
cilities, multipathed iSCSI deployments 
can achieve very high reliability. 


Evaluating iSCSI Performance 

Storage is such a fundamental re- 
quirement that it appears in a wide va- 
riety of forms in different system archi- 
tectures. To characterize system perfor- 
mance, SBE conducted benchmarks of 
an iSCSI storage system in two different 
contexts: a stand-alone enterprise-class 
storage appliance and an embedded, 
ruggedized storage gateway. 

In these tests, the 1 GHz PowerPC 
CPU of the embedded target device and 
the 2.2 GHz Intel-based CPU of the en- 
terprise-class system achieved greater 
than 73 Mbytes/s of throughput per GHz 
of CPU. This normalized performance 
was consistent regardless of the number 
of disks in use, the nature of the opera- 
tions, such as READ and WRITE, or 
the type of disk technology, such as SAS 
and SATA. 

Depending on system character- 
istics, hardware acceleration of both 
iSCSI and TCP may improve perfor- 
mance by more than 20%. These nor- 
malized measurements describe consis- 
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tent, robust iSCSI system performance 
regardless of significant variations in 
system architecture. 


The Attraction of iSCSI 


The tradeoffs among simplified 
application architecture, flexibility of 
network connections and benefits of vir- 
tually local storage make iSCSI an at- 
tractive technology for high-reliability 
embedded systems. Traditional storage 
architectures have required physically 
separate, application-specific subnet- 
works for storage devices. These are 
characterized by strict distance limita- 
tions and high cost, including the costs 
of specialized management technologies 
and specialized interfaces for client de- 
vices. 

In contrast, iSCSI utilizes an in- 
dustry-standard, off-the-shelf network 
paradigm, [P-over-Ethernet, for both 
data and storage traffic. This IP network 
transport method can extend over arbi- 
trary distances and network infrastruc- 
tures, leveraging existing investments 
in management tools, technologies and 
Strategies. Additionally, the simplified 
application architecture allows client de- 
vices with any type of network connec- 
tivity to address unlimited data via the 
virtual storage model. @ 
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San Ramon, CA. 
(925) 355-2000. 
[www.sbei.com]. 
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Small, Fast, High Throughput 





WIN Enterprises works with world-leading 
semiconductor manufacturers in applying high- 
performance processors to small form factor, 
embedded OEM solutions. 


WIN’s MB-06047, jointly developed with AMD™, 
is a long-life single board computer (SBC) that 
optimizes the small form factor with high- 
performance processing power. By applying the 
AMD Opteron™ dual-core processor and other 
state-of-the-art components to an EBX form factor 
(it's a meager 5.75” x 8”), WIN offers the highest 
performance, smallest form factor available 

to OEMs. 


A stackable HyperTransport™ connector and 
PCI Express™ maximize system throughput. 
Advanced AMD microarchitecture means low 
relative heat production. OEMs utilize the 
MB-06047 as a controller or as part of a 
complete embedded solution where small 
footprint, high performance, low heat are 
requirements. 


MB-06047 EBX Form Factor 
Features 
= Low-power AMD Opteron™ Processor 
= nVidia nForce™ Professional 2200 Chipset 
= Quad-ranked memory 
= Two 10/100/1000 Gigabit Ethernet Ports 
«= Stackable Hypertransport™ Connector 
= 16-Lane PCI Express™ Slot 
= CompactFlash™ & Express Card™ sockets 
= 4x SATA...and more 


The MB-06047 is an off-the-shelf (COTS) 

AMD reference design available now—ideal 
for intense networking, imaging, storage and 
gaming applications. Solutions utilizing 

dual, dual-core AMD Opteron™ processors are 
also available from WIN Enterprises. 


Contact WIN Enterprises 
(978) 688-2000 x23 
sales@win-ent.com 
www.win-ent.com 
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PCI Express and InfiniBand 


The PCI Express-InfiniBand 
Connection 


PCI Express and InfiniBand initially competed for the same system 
space. Today, they fill two complementary roles, meeting and sometimes 
competing on the backplane. 


by Jack Regula 
PLX Technology 


that InfiniBand (IB) was once touted 

as the replacement for PCI, destined 
to bring a new high-performance, high- 
availability I/O model to the enterprise. 
The advantages of a switched serial inter- 
connect seemed overwhelming compared 
to the limitations of PCI’s parallel bus 
structure and the PCI tree topology. As it 
turned out, the weight of PCI’s infrastruc- 
ture and its software legacy made it too 
heavy to unseat. The need for a complete 
new infrastructure, both software and 
hardware, proved to be an insurmount- 
able barrier. After suffering through a 
long gestation period, the IB camp suf- 
fered attrition coincident with the burst- 
ing of the tech bubble. Just when we had 
written it off, signs began to appear that 
IB had secured a niche. Now the surviv- 
ing IB players are enjoying its success as a 
cluster interconnect in high-performance 
computing. 

When it became clear that IB would 
not replace PCI, the need remained to re- 
invent PCI. The lessons of the recent past 
were applied to this task (by the usual sus- 
pects), and out of the ashes of the origi- 
nal IB vision arose PCI Express (PCIe). 
PCIe uses a PHY very similar to IB but 
retains its PCI heritage. PCIe is config- 


Tos with a long memory will recall 


ured just like PCI; its topology is a tree 
of PCI-to-PCI bridges in which routing is 
performed by address and device ID (bus, 
device, function number). With PCI-to- 
PCIe bridges, legacy PCI devices can be 
employed with good performance using 
unmodified legacy software. PCIe was 
driven to success by implementation on 
Northbridges and graphics processors, 
followed quickly by storage and network 
adapters for the enterprise. While systems 
still ship with a mix of PCI and PCle slots, 
it is abundantly clear that PCIe is replac- 
ing PCI and AGP, and will soon be as 
ubiquitous as PCI once was. 

PCle was initially set on a parallel 
technological track with IB but pursuing 
an entirely different set of applications 
and market segments. PCle provides cost- 
optimized but still high-performance I/O 
for the desktop and enterprise limited to 
a single root complex, while IB provides 
performance-optimized cluster intercon- 
nects scaling beyond 10,000 processing 
nodes. 

The embedded market has a history 
of taking enterprise and desktop technol- 
ogy and adapting it to its needs. The 
economies of scale thus leveraged are 
compelling. The storage and Ethernet 
controller chips developed for the en- 


terprise and the desktop are used almost 
everywhere such interfaces are needed 
in the embedded space. X86 processors 
compete for tasks formerly performed by 
embedded and communications proces- 
sors. When PLX applied non-transpar- 
ent bridging, which is standard in PCI, to 
PCIe switches, it allowed PCIe to expand 
beyond its nominal single-host limit to 
support multiple hosts, failover and sys- 
tems with redundant fabrics. In addition 
to non-transparent bridging, two PCle 
specification development activities at 
the PCI-SIG—I/O Virtualization (IOV) 
and the PCIe Cable Standard—will allow 
PCIe to encroach further into [B’s inter- 
connect space. However, it’s important to 
look at some of the key elements of these 
developing specifications to see just how 
far the new capabilities extend. 


I/O Virtualization 

For the past 18 months, the PCI-SIG’s 
IOV work group has been developing a 
standard that will allow the sharing of I/ 
O adapters by multiple hosts, as well as 
by multiple system images running on a 
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single host. IOV, whose specification the 
PCI-SIG expects to complete by the end 
of 2006, is part of a larger trend toward 
virtualization throughout the enterprise. 
Servers themselves are being virtualized 
to reduce maintenance costs, increase 
system resiliency and make better use of 
multi-core processors. When a server’s 
workload is divided among applications 
with each running under its own guest 
operating system, then fault and error-side 
effects can be constrained to just a single 
application. IOV supports that trend by al- 
lowing separate virtual I/O devices, within 
a single physical I/O component, to be as- 
signed to each guest OS, thus limiting the 
scope of I/O errors to a single guest OS or 
system image. 

I/O sharing and IOV are two sides 
of the same coin. I/O sharing is enabled 
when a single I/O device is made to look 
like many, primarily by giving each vir- 
tual instance of the I/O function its own 
set of control and status registers (CSRs). 
The ability to share I/O devices among 
blades provides a compelling advantage 
for PCIe as the backplane interconnect for 
blade servers. I/O can be removed from 
the blades and direct connections made 
from the root complexes to the backplane 
switch, saving both cost and latency. 
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The Virtual Hierarchies of a Multi-Host lOV Fabric. 


Throughput is increased by stepping up 
from | Gbit/s to x4 or x8 PCle. The cost of 
10 Gbit/s I/O adapters is amortized over 
all the compute blades in the backplane. 
The IOV standard minimizes the software 
impact of an otherwise revolutionary ad- 
vance. 

To share virtual instances of an I/O 
function among multiple hosts, one must 
create a multiple-host-aware fabric. The 
direction chosen by the PCI-SIG is to ex- 
tend packet headers with a host ID field, 
which are added/removed at the host ports 
of switches to allow the use of legacy root 
complexes. Multi-host-aware switches 
implement a separate CSR space, includ- 
ing address and ID routing information, 
for each host using the host ID to select 
the routing information for each packet as 
it passes through the switch. Multi-host- 
aware I/O controllers use the host ID as 
part of the address for incoming packets 
and attach it to outgoing packets to allow 
them to be routed upstream. The result is 
that each host sees a virtual PCI hierarchy 
fully compatible with that of a standard, 
single-host PCIe system. The multi-host 
fabric contains multiple virtual PCI hi- 
erarchies overlaid within a single physi- 
cal switch or fabric, as illustrated via the 
color coding of Figure 1. 
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A careful examination of Figure 1 will 
show no connections among the separate 
virtual hierarchies. The IOV specification 
is silent on host-to-host communications, 
neither standardizing it nor precluding it. 
It is a relatively simple matter for a switch 
vendor to add non-transparent bridges to 
allow hosts to open windows into each 
other’s domains. It is only slightly more 
complicated for the switch to include a 
DMA engine to speed the transfer of data 
through those windows. Such proprietary 
features fit the embedded and communi- 
cations usage models better than enter- 
prise blade servers, where software and 
security concerns argue against the use of 
PCle as a host-to-host interconnect. 


Cable Standards 

A cable specification is included in 
the IB specification. Because of the close 
match in PHY between the two intercon- 
nects, IB cables will work for PCIe. How- 
ever, PCIe devices don’t drive the cable as 
hard as IB does, so the maximum length 
of passive cable usable for PCIe is lower 
than it is for IB. Preliminary results sug- 
gest that PCIe can drive passive cables 
of approximately seven to eight meters 
in length. Increased lengths are possible 
using techniques such as re-driving, re- 
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| The CPC5564 64-Bit AMD Opteron™ Single Board Computer 


The CPC5564 is the world’s most powerful PICMG® 2.16 compute blade, and the first to be based on 
single- and dual-core AMD Opteron™ processors. The AMD Opteron™ processor provides a highly scalable 
x86 architecture that delivers next-generation performance as well as a flexible upgrade path from 32- to 
64-bit computing. Its multi-core architecture offers advanced processing speed while reducing heat and 
power consumption. 





With up to 8GB of ECC memory, multiple storage options and Linux, Solaris™ and Windows® operating system support, 
the CPC5564 is an ideal computer for high-end packet processing or multi-threaded software environments found in 
wireless, softswitch and defense applications. 


Is it the superhero of the compute world? We like to think so. 
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Real-time monitoring for critical patient care. 
That's the Power of Windows Embedded. 


Getting real-time patient monitoring devices into the hands of doctors and nurses can save lives. That’s why the engineers at Zoe 
Medical chose the speed and reliability of Windows CE to develop their Nightingale Personal Patient Monitor (PPM2) and make 


it available to hospitals in just 12 months. 


With only two developers and a short timeframe, Zoe Medical took advantage of the shared source code in Windows CE to 
move its applications from its traditional MS-DOS platform to a system that is more flexible, familiar, and provides the graphic 
and audio support its customers demand. Plus, the hard real-time performance of Windows CE met the strict requirements of the 
PPM2 to monitor and communicate vital patient functions as they happen. 

** We took two critical patient care devices to market in only a year. Windows CE was a 


big part of that achievement.”? — JIM CHICKERING / Clinical Applications Manager / Zoe Medical Development 


The Power to Build Great Devices—get it with Windows CE, Windows XP Embedded, or Windows Embedded for Point of Service. 
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timing and/or equalization using active 
devices in the connector shell, enabled by 
defined power pins. 
The primary motivations for the PCIe 
cable specification were these: 
e Support multiple link width options: 
xl, x4, x8, x16 
e Reach cost targets suitable for high- 
volume markets 
e Support the I/O usage model 


However, unlike that of IB, the I/O 
usage model for PCle resulted in its cable 
specification having defined upstream and 
downstream ends. This, together with the 
presence of sideband reset and clock sig- 
nals, among other factors, complicates its 
use in multi-host systems. The cable spec- 
ification doesn’t address such uses. 

The PCIe cable drives a reference 
clock from the upstream end to the down- 
stream end. If spread spectrum clocking 
(SSC) is used in the system, then the refer- 
ence clock must be used by the PHY at the 
downstream end of the cable. This fits with 
the standard I/O usage model that employs 
a common reference clock throughout 
the system. Multi-host systems, whether 
stacked or bladed, are likely to have an in- 
dependent SSC clock at each host. While it 
is possible to build switch fabrics that can 
handle multiple independent SSC links 
and clocks, this is not required for compli- 
ance with the PCle specification. Leading 
PCIe switch vendors continue to develop 
solutions for this problem in response to 
growing market demand. The PLX, for 
example, makes an eight-lane switch that 
can be used in a two-port configuration to 
drive a cable, while providing both non- 
transparent bridging and SSC clock do- 
main isolation. 

PCIe is an I/O interconnect that can 
be enhanced to connect multiple proces- 
sors, while IB, originally intended for 
use as an I/O interconnect, instead has 
become a clustering interconnect. PCle 
and IB meet and in some cases compete 
on the backplane. Both PCIe and IB are 
specified as optional backplane intercon- 
nect technologies for ATCA, for example, 
where strangely enough, the front-runner 
is Ethernet. PCle is, after all, the chip-to- 
chip interconnect used between the pro- 
cessor chip sets and the IB or Ethernet 
controllers used to drive the backplane, 
and is used throughout such systems as a 


local bus and mezzanine interconnect. 
While non-transparent bridging is 
a familiar and indeed de facto standard, 
the remaining specification infrastructure 
for PCle-linked multi-host systems is still 
developing, and switches with multiple 
non-transparent bridges or IOV multi-host 
capabilities have yet to appear. As time 
passes and the PCle infrastructure is en- 
hanced in both standard and proprietary 
means, PCIe will see increasing use in 
multi-host systems. Nevertheless, IB will 
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Powerful processing on small form factor boards. 
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remain secure in its niche both because it 
can scale well beyond the backplane and 
because of the barrier created by all the 
clustering software that continues to be 
developed around it. @ 


PLX Technology 
Sunnyvale, CA. 
(408) 774-9064. 
[www.plxtech.com]. 
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MPC7447/7448 on PrPMC and 3U CPCI 


The e6 series of processors offers powerful PowerPC processors ideal for 
applications in industrial automation, communications, medical imaging, and 


defense. 


This e6 series are ideal for environments where space is at a premium. 
Available in commercial, extended temp, and conduction cooled versions. 


e6 Series Features: 


@ Choice of MPC7447A or MPC7448 
@ Two or three Gigibit Ethernet ports 
@ USB and multi-function serial ports 
® Software compatible with MPC7xx family 
@ Upto 512 MB SDRAM, 2MB SRAM 
| 2 Support for major RTOS and Linux 


®@ Available conduction cooled 





G752 PowerQuUiCc it PrPMC 


Call us or visit our website for details on this and mere. 


www.acttechnico.com 


215-937-9102 or 800-445-6194 






PowerQUiCc Il & Ill versions available 
Dual 6U 2eSST VME Carrier Board 
3U or 6U cPC) Carrier Boards 
JTAGICOP & RS232 Concole 
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Better than Ultimate? 
The ICS-645D puts it all together. 


The ICS-645C was a finalist in the EE Times Ultimate Products program — so imagine how good 
the 32-channel ICS-645D PCI data acquisition card is. It takes the robust, stable ICS-645C platform 
with its differential inputs and four voltage ranges and builds on it, with onboard storage increased 
to 16 MBytes/second. Even better, the ICS-645D now features 64-bit, 66 MHz PCI 2.2, increasing 
PCI bandwidth by a factor of four. Integrated ADC, gain, anti-alias filtering and programmable 
clock all come as standard. 









The benefit? No need to provide separate signal conditioning circuitry in advance 
of the analog to digital conversion — and no need for an auxiliary clock source either. 
It’s all there, right on the board, ready to go. That can mean simpler, more compact, 
more cost-effective systems — and shorter development times. 


ICS. Leading the Way. 


ICS 


S F N S () R Pp ROC F S S | N ( ICS-645C chosen as a finalist in the Ultimate 
Products supplement of EE Times Feb 2005 





CHECK THE FULL SPEC NOW AT OR CALL TOLL FREE (US ONLY) 


www.ics-ltd.com/645d 800-267-9794 or 613-749-9241 
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Accelerate Your Applications 
with Hard Real-lime and 
Reliable I/O over InfiniBand 


Application developers and end users can deploy their applications 
over InfiniBand with minimal to no porting effort and take advantage 
of the benefits the architecture offers in embedded and real-time 
applications. 


by Sujal Das 
Mellanox Technologies 


a high-bandwidth, low-latency clus- future. Red Hat and Novell/SUSE have has been ported to embedded operating 
tering interconnect that is used for announced plans to include InfiniBand — systems such as VxWorks as well. 
high-performance computing (HPC) and 


enterprise data center (EDC) class appli- 
cations. It is an industry standard devel- 
oped by the InfiniBand Trade Association 

Since the release of the specifica- 
tion, the InfiniBand community has been 


Transparent and Standard Application Interfaces 


T: InfiniBand architecture defines and submitted to Kernel.org in the near software in their distributions. The code 





active in developing software for all ma- 


jor OS platforms including Linux open IPoIB SRP SDP iSER NFS-R Others 


source software for the InfiniBand archi- 
tecture. The Linux software community, 
comprised of major suppliers to the HPC 
and EDC markets, collaborates on joint 
Open source software development as 
part of the OpenFabrics alliance (www. 
OpenFabrics.org, previously — called 
OpenIB.org). 

The open source code for the In- 
finiBand architecture has matured to the 
point where portions of it are included in 


the base Linux kernel. Other modules are : 
under development and will be completed 
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Interfacing between IP-based Ethernet and InfiniBand hardware. 


InfiniBand in Embedded and 
Real-Time Applications 

InfiniBand adoption in embedded 
and real-time applications has grown sig- 
nificantly and continues the strong trend. 
Embedded applications such as ATCA 
chassis, digital image processing systems, 
storage targets and clustered storage sys- 
tems use the high bandwidth (up to 20 
Gbits/s) and low latency (less than 3 usec) 
features of InfiniBand to offer compelling 
price-performance benefits. 

The lossless nature of the InfiniBand 
fabric and the end-to-end quality of ser- 
vice delivery mechanisms available via 
its I/O channel architecture makes it a 
perfect fit for latency- and timing-sen- 
sitive applications used over real-time 
Operating systems. Financial trading 
and analytics for large volumes of mar- 
ket data, where the speed and timing 


guarantee of order execution is of para- 
mount importance, are examples of such 
applications. InfiniBand I/O possesses 
no-compromise I/O services character- 
istics (hard real-time and no tolerance 
for timing delays) that perfectly comple- 
ment real-time operating systems that are 
bound by the same characteristics for the 
services that they provide. This is unlike 
TCP-based I/O services such as Ethernet, 
where dropping of packets and software- 
based end-to-end service implementation 
result in unpredictable I/O services. 


Software Architecture 

Figure | shows the Linux InfiniBand 
software architecture. The software 
consists of a set of kernel modules and 
protocols. There are associated user- 
mode shared libraries as well that are 
not shown in the figure. Applications that 
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Interfacing Linux sockets-based applications to InfiniBand via the 


sockets direct protocol. 
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operate at the user level stay transparent 
to the underlying interconnect technol- 
ogy. The focus here is to discuss what 
application developers need to know to 
enable their IP, SCSI, iSCSI, sockets or 
file system-based applications to operate 
over InfiniBand. 

The kernel code divides logically into 
three layers: the host card adapter (HCA) 
driver(s), the core InfiniBand modules, 
and the upper-level protocols. The core 
InfiniBand modules comprise the kernel- 
level mid-layer for InfiniBand devices. 
The mid-layer allows access to multiple 
HCA network interface cards (NICs) and 
provides a common set of shared services. 
These include the following: 

e User-level Access Modules — The user- 
level access modules implement the 
necessary mechanisms to allow access 
to InfiniBand hardware from_user- 
mode applications. 

e The mid-layer provides the following 
functions: 

e The Communication Manager (CM) 
provides the services needed to al- 
low clients to establish connections. 

e The Subnet Administrator (SA) cli- 
ent provides functions that allow 
clients to communicate with the 
subnet administrator. The SA con- 
tains important information, such 
as path records, which are needed 
to establish connections. 

e The Subnet Manager Agent (SMA) 
responds to subnet management 
packets that allow the subnet man- 
ager to query and configure the de- 
vices on each host. 

eThe Performance Management 
Agent (PMA) responds to manage- 
ment packets that allow retrieval of 
the hardware performance counters. 

e Management Datagram (MAD) ser- 
vices provide a set of interfaces that 
allow clients to access the special In- 
finiBand Queue Pairs (QP), 0 and 1. 

e The General Services Interface 
(GSI) allows clients to send and re- 
ceive management packets on spe- 
cial Queue Pair (QP) 1. 

e QP redirection allows an upper-level 
management protocol that would nor- 
mally share access to special QP 1 to re- 
direct that traffic to a dedicated QP. This 
is done for upper-level management pro- 
tocols that are bandwidth-intensive. 








eThe Subnet Management Interface 
(SMI) allows clients to send and re- 
ceive packets on special QP 0. This is 
typically used by the subnet manager. 

e The mid-layer provides access to 
the InfiniBand Verbs supplied by 
the HCA driver. The InfiniBand 
architecture specification defines 
the Verbs. A Verb is a semantic de- 
scription of a function that must be 
provided. The mid-layer translates 
these semantic descriptions into a 
set of Linux kernel application pro- 
gramming interfaces (APIs). 

e The mid-layer is also responsible for 
resource tracking, reference counting 
and resource cleanup in the event of 
an abnormal program termination or 
in the event a client closes the inter- 
face without releasing all of the allo- 
cated resources. 

e The lowest layer of the kernel-level 
InfiniBand stack consists of the HCA 
driver(s). Each HCA device requires 
an HCA-specific driver that registers 
with the mid-layer and provides the In- 
finiBand verbs. 


The upper-level protocols such as IPoIB, 
SRP, SDP, iSER, etc., facilitate standard 
data networking, storage and file system 
applications to operate over InfiniBand. 
Except for IPoIB, which provides a simple 
encapsulation of TCP/IP data streams over 
InfiniBand, the other upper-level protocols 
transparently enable higher bandwidth, 
lower latency, lower CPU utilization and 
end-to-end services using field-proven Re- 
mote DMA (RDMA) and hardware-based 
transport technologies available with In- 
finiBand. The following is a discussion of 
those upper-level protocols and how exist- 
ing applications can be quickly enabled to 
operate over them and InfiniBand. 


Porting IP Applications 

The Internet Protocol over InfiniBand 
(IPoIB) upper-level protocol supports tun- 
neling of Internet Protocol (IP) packets over 
InfiniBand hardware (Figure 2). The proto- 
col is implemented as a standard Linux net- 
work driver and thus allows any application 
or kernel driver that uses standard Linux 
network services to use the InfiniBand 
transport without modification. Linux ker- 
nel 2.6.11 and above includes support of the 
IPoIB protocol, the InfiniBand Core and an 


User 


Kernel 


In Linux 
Kernel 


Hardware 








Standard Linux 
Storage Stack 


The SCSI RDMA protocol enables interfacing between InfiniBand 
hardware and the host bus adapter via the SCSI mid-layer. 


HCA driver for HCA NICs based on Mel- 
lanox Technologies’ HCA silicon. 

This method of enabling IP applica- 
tions over InfiniBand is effective for man- 
agement, configuration, setup or control 
plane-related data where bandwidth and 
latency are not critical. Because the ap- 
plications continue to run over the stan- 
dard TCP/IP networking stack, they are 
completely unaware of the underlying I/O 
hardware. However, to attain full perfor- 
mance and take advantage of some of the 
advanced features of the InfiniBand archi- 
tecture, application developers may want 
to use the sockets direct protocol (SDP) 
and related sockets-based API. 


Porting Sockets-Based 
Applications 

The SDP or sockets direct protocol 
driver provides a high-performance in- 
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terface for standard Linux socket applica- 
tions and provides a boost in performance 
by bypassing the software TCP/IP stack, 
and by transferring data using efficient 
RDMA and hardware-based transport 
mechanisms. 

InfiniBand hardware provides a reli- 
able and hardware-based transport. As 
such, the TCP protocol becomes redun- 
dant and can be bypassed, saving valuable 
CPU cycles (Figure 3). Zero-copy SDP 
implementations can save on expensive 
memory copies and use of RDMA can 
help save on expensive context switch 
penalties on CPU utilization, performance 
and latency. The SDP protocol is imple- 
mented as a separate network address 
family. For example, TCP/IP provides the 
AF_INET address family and SDP pro- 
vides the AF_SDP (27) address family. To 
allow standard sockets applications to use 
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The CMA layer enables iSER to connect the standard Linux storage stack 
to both InfiniBand and iWARP-based RDMA technologies. 
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An NFS-RDMA client is currently being developed to extend NFS to take 
advantage of the RDMA features of InfiniBand. 


SDP without modification, SDP provides 
a preloaded library that traps the LibC 
sockets calls destined for AF_INET and 
redirects them to AF_SDP. Applications 
do not need to change except to interface 
with the preloaded library. 


Porting SCSI and iSCSI 
Protocol-Based Applications 

SCSI RDMA Protocol (SRP) was 
defined by the ANSI T10 committee to 
provide block storage capabilities for the 
InfiniBand architecture. SRP is a proto- 
col that tunnels SCSI request packets over 
InfiniBand hardware using this industry- 
standard wire protocol. This allows one 
host driver to use target storage devices 
from various hardware vendors. 

As shown in Figure 4, the SRP upper- 
level protocol plugs into Linux using the 
SCSI mid-layer. Thus, to the upper layer 
Linux file systems and user applications 
that use those file systems, the SRP de- 
vices appear as any other locally attached 
storage device, even though they can be 
physically located anywhere on the fabric. 
It 1s worthwhile to mention that SRP is 
part of the latest Linux kernel version. 

ISER GSCSI RDMA) eliminates the 
traditional iSCSI and TCP bottlenecks by 
enabling zero-copy RDMA, offloading 
CRC calculations in the transport layer to 
the hardware and by working with message 
boundaries instead of streams. It leverages 
iSCSI management and discovery facili- 
ties and uses SLP and iSNS global storage 
naming. The iSER specification for In- 
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finiBand and other RDMA fabrics is driven 
within the Internet Engineering Task Force 
(IETF), and the IBTA has created an annex 
for support of 1iSER over InfiniBand. 

ISER follows the same methods of 
plugging into Linux using the SCSI mid- 
layer. However, as seen in Figure 5, iSER 
works over an extra layer of abstraction 
(the CMA, Connection Manager Abstrac- 
tion layer) to enable transparent opera- 
tion over InfiniBand and 1WARP-based 
RDMA technologies. Applications that 
interface with LibC at the user level and 
the Linux File Systems at the kernel level 
work transparently, unaware of what in- 
terconnect technology is being used un- 
derneath. 


Porting NFS-Based 
Applications 

Network File System (NFS) over 
RDMA is a protocol being developed 
by the Internet Enginnering Task FOrce 
(IETF). This effort is extending NFS to 
take advantage of the RDMA features 
of the InfiniBand architecture and other 
RDMA-enabled fabrics. 

As shown in Figure 6, the NFS- 
RDMA client plugs into the RPC switch 
layer and the standard NFS v2/v3 layer 
in the Linux kernel. The RPC switch 
directs NFS traffic either through the 
NFS-RDMA client or the TCP/IP stack. 
Like the iSER implementation, the 
NFS-RDMA client works over the CMA 
to provide transparent support over In- 
finiBand and iWARP-based RDMA 


technologies. File system applications 
interface to the standard Linux file sys- 
tem layer and are unaware of the under- 
lying interconnect. 


Availability of InfiniBand 
Software 
The Linux-based HCA driver and In- 
finiBand Core are available from multiple 
sources, and they are based on and there- 
fore interoperable with what is developed 
in the OpenFabrics.org community: 
e OpenFabrics.org releases 
e Kernel.org Kernel releases starting 
v2.6.11 
e Red Hat AS 4.0 Update 3 as a tech- 
nology preview release (and soon to 
be supported as production release in 
upcoming updates) 
e In upcoming Novell SLES 10 update 
releases 


Upper-level protocols and libraries 
are available from the following sources: 
e OpenFabrics.org releases—all except 
NFS-RDMA 
° Kernel.org 
and SRP 
eRed Hat AS 4.0 Update 3—IPoIB, 
SRP and SDP 
e http://sourceforge.net/projects/nfs- 
rdma/—NFS-RDMaA client and server 


Kernel releases—IPoIB 


SBS Technologies Inc. has launched 
the IB4X-VX WORKS InfiniBand driver 
for the VxWorks real-time operating sys- 
tem with IPoIB and SDP protocols for 
InfiniBand HCAs.Available from Metro- 
matics Pty Ltd, this high-performance In- 
finiBand technology driver is ideal for Vx- 
Works real-time applications that require 
high-speed data transfers such as medical 
imaging for MRI and CT equipment, ra- 
dar and sonar and other applications that 
involve intensive image processing. 

IB4X-VXWORKS complies with 
the VxWorks Ready Driver board support 
package (BSP) API so that it works across 
SBS Technologies’ wide range of Ready 
Driver x86 and PowerPC single board 
computers and [I/O cards. The above re- 
flects the status at the time this article was 
composed. Other real-time platform-based 
solutions are currently under development 
and will be available in 2006. @ 


Mellanox 











RTT 
... ready for Extreme Embedded Computing? 





ETXexpress products are next generation 
embedded modules based on the PICMG 
COM Express standard. ETXexpress 
provides the hightest performance and 
1/0 bandwidth available in COMs. 


» PCI Express - the elemental data path 

> Gigabit Ethernet - for high connectivity 
» USB 2.0 - for fast periphery 

» Serial ATA - for fast drives 

» ACPI - for optimized power management 





40 othvor tredkooarks 


“ ETXexpress-CD 
» Highest performance state-of-the-art embedded module 
» Intel® Core™ Duo processor and Mobile Intel” 945GM Express chipset 


4005 Kap “unence 


Get ready. Get ETXexpress 


Visit www.kontron.com/ETXexpress 


wT 


) ! | q Intel! 
E &T _ CoM , Communications 
ieee Express Alliance 


Previge Meee 


Kripa and! dhe Kontron loge aro nepietoned trademarks of Kontron AG 
wit [ 


are the Penperty ol Cher respective or 














| Computer-On- | Blades & CPU Sustams | Mobile | Custom i _ ; 
eae 6Mezzanines | Boards y ia dl} of 


Rugged | Solutions 





industrylnsight 


Machine to Machine 





Communicating Machines 
Are Triggering an 
Embedded Revolution 


Embedded devices are getting the ability to autonomously exchange 
information via low-cost communication links. Machine-to-machine 
communication lets developers combine them in large, even vast, 
systems whose functionality is greater than the sum of its parts. 


by Bob Burckle, WinSystems 
Steve Pazol, nPhase 


he introduction of the Internet has 
Tene the way we communicate 
and run our businesses, with the 
count of people using it totaling over one 


billion strong. Yet a similar, but signifi- 
cantly larger, revolution is poised to take 
place in embedded computing with the 
advent of machine-to-machine (M2M) 


M2M Will Extend to an Enormous Device Population* 


1+ Trillion 






2 Billion Devices include: 
Appliances 
Machinery 
Vehicles 
Building Equip. 
etc. 





i \ Devices include: 
Personal Mobile Phones 
Computers PDAs 
300 Million Web Tablets 
etc. 


Smart Devices 


RFID/Sensors 


RFID/Sensors include: 
Location 
Humidity 

Temperature 
Vibration 
Liquid 
Weight 
etc. 


*Forecast of installed base, 2010 


By 2010, machine-to-machine (M2M) communications could extend to 
a device population well over a trillion, according to estimates from The 
Focal Point Group (www.fpgroup.com). 
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communications. This linkage of ma- 
chines has enormous market potential and 
holds the promise of dramatically altering 
and expanding the landscape of embed- 
ded computing. 

At its heart, M2M simply involves 
devices that can communicate over a net- 
work. These wired or wireless communica- 
tions can be quite simple, offering only the 
ability to send and receive data and simple 
command-and-control parameters over the 
network. The specific network protocol or 
means of connecting to the network are 
not what’s important. The key ingredient 
is the ability of a device to become part of 
something larger, resulting in enhanced in- 
formation flow in numerous environments 
and markets. Market research estimates put 
the potential for such applications at well 
over a trillion devices by 2010 (Figure 1). 

In some ways, M2M resembles tra- 
ditional industrial control. Both involve 
communications among machines. But tra- 
ditional industrial control networks, such 
as supervisory control and data acquisition 
(SCADA), have more restrictive require- 
ments than M2M. Industrial control net- 
works require reliable, real-time commu- 
nications in order to fulfill their purposes. 





Individual SCADA devices depend on sig- 
nals from the network in order to perform 
their functions properly, and they must 
receive information in a timely fashion. In 
many cases, these networks also require a 
human operator to process data, make con- 
trol decisions and issue commands. 

M2M networks have no such restric- 
tions. The primary purpose of M2M net- 
works is not necessarily to control a pro- 
cess, but rather to simply gather and pass 
along data to a central server. As a result, 
M2M devices tend to be relatively autono- 
mous, performing their functions with lit- 
tle or no input from outside sources. This 
allows the networks used to be tolerant of 
delays and even failures in communica- 
tions. In turn, easing the restrictions gives 
M2M networks wider range. Devices can 
be scattered across a large area—even 
worldwide—because they do not depend 
on time-critical information from the net- 
work. Although commands and parameter 
updates may come across the network, ei- 
ther from a human or a central server, the 
function of an M2M device does not de- 
pend on such communications. 


Low Cost Driving M2M 

The rising interest in M2M systems 
has several drivers. One is the declining 
cost and increasing performance of pro- 
cessors. Many devices that might have 
once used hard-wired logic to provide a 
low-cost implementation can now be run 
as software on an embedded processor 
for the same cost, with the benefit of in- 
creased design flexibility. 
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A representative M2M module for mobile, industrial applications, such as 
this WinSystems PCM-GPS, includes either a CDMA or GSM cell modem 
for communications and a GPS receiver for determining location. 


Another factor driving interest in 
M2M is the success of standards for ro- 
bust and versatile communications chan- 
nels. Networks such as Ethernet and pro- 
tocols such as TCP/IP have permeated the 
industry, leading to high-volume produc- 
tion and thus lowering the implementation 
cost of hardware and software. The suc- 
cess has also ensured widespread design 
expertise, easing system development. 

The growth in wireless systems has 
also spurred the development of M2M ap- 
plications. While an M2M system does 
not have to involve wireless communica- 
tions, freedom from wires provides de- 
sign flexibility that greatly increases the 
scope of potential applications. As with 
the networks and protocols, the success of 


wireless systems has lowered the cost of 
hardware implementation through high- 
volume production. Wireless transceiver 
modules are available for numerous stan- 
dards, including Bluetooth, WiFi, cell phone 
and satellite communications (Figure 2). 
The common thread in all of this is 
lowered cost. Many potential M2M appli- 
cations provide value in proportion to the 
number of devices on the network. Auto- 
mated wireless gas meter reading, for exam- 
ple, benefits the gas company by increasing 
the efficiency of meter readers and thereby 
reducing operating costs. The more meters 
the gas company automates, the greater the 
savings. A low per-unit implementation 
cost helps quickly amortize the investment 
needed to first establish the network. 
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Adding M2M communications to something as simple as a traffic sign 
allows remote management and message updating, greatly increasing 
the sign’s utility while reducing its operating costs. 


Other M2M_ applications provide 
benefits that do not directly reduce operat- 
ing costs but instead increase the utility or 
flexibility of a device. A consumer appli- 
ance with communications capability, for 
instance, can receive upgrades or alert ser- 
vice personnel of imminent failures. Both 
activities increase customer satisfaction, 
potentially increasing revenue but without 
providing a direct line-item benefit. Such 
features will not command a substantial 
price premium from consumers, however, 
so they must be inexpensive to implement 
on a unit basis. 

Fortunately, the cost of adding com- 
munications capability to an embedded 
system, especially wireless communica- 
tions, 1s low and dropping. This is fuel- 
ing an increase in M2M applications that 
is sure to accelerate. Any application that 
can benefit from the easy exchange of 
information is a candidate for an M2M 
system. This also goes for applications 
that can extend their geographic range or 
their scope by using wireless connections. 
Similarly, applications that are impracti- 
cal because of the difficulty in making 
a wired connection can become feasible 
when wireless connections are used. 


Adding Communications 
Boosts Value 

An example can help illustrate the 
growth potential of M2M systems. Con- 
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sider a roadside traffic sign—the kind that 
flashes a message about road conditions or 
public service messages (Figure 3). These 
sign systems use embedded processors to 
handle the message display and manage 
battery power. The onboard processor may 
have a selection of pre-defined messages 
in memory that are user-selectable or may 
even have a means of accepting custom 
messages from a human operator. The sign 
system might also be tied in with traffic 
sensors to make an automated traffic con- 
trol system. The sensors would monitor 
the passage of cars and determine if they 
are slowing down, indicating a jam in the 
making. The signs could then automati- 
cally begin announcing the traffic condi- 
tions ahead without human intervention. 

Adding communications capabil- 
ity can greatly increase the sign’s utility. 
As part of an M2M network it can com- 
municate its status to a central office and 
receive commands. Thus, the sign can in- 
form a central office of its need for main- 
tenance or battery replacement. It could 
also accept message downloads from the 
network, eliminating the need for an op- 
erator to update or change messages. Traf- 
fic-detecting signage could also inform a 
central office of local traffic conditions, 
allowing the central office to update the 
signs with messages suggesting alterna- 
tive routes based on the traffic levels that 
those other routes show. 


If the sign were transportable and 
had global positioning system (GPS _) 
capability, it could automatically inform 
the central office of its installed location, 
eliminating the need for manual location 
tracking. This would allow operators to 
simply position signs without having to 
register them. The office can customize 
messages to each sign based on its loca- 
tion and coordinate the messages dis- 
played by a series of signs, without any 
operator involvement beyond placing the 
sign. The position information could also 
allow the system to alert police when an 
unauthorized move takes place (i.e., the 
unit is being stolen). 

The addition of communications ca- 
pability to the sign’s hardware thus opens 
the door for several new functions and ap- 
plications, expanding on and automating 
the original system’s function. The addi- 
tion of GPS expands the system’s potential 
even further with a minimal unit cost in- 
crease. Linking in these devices as part of 
a larger system creates whole new appli- 
cations fo r the signs beyond their original 
use. This, then, is the essence of the M2M 
revolution: a relatively minor addition of 
communications capability unlocks the 
potential for a wide range of application 
possibilities. 


Focus on Technical Strengths 

Developers seeking to capitalize on 
the rising wave of M2M possibilities will 
need to carefully evaluate their technical 
strengths against the myriad communica- 
tions options open to them. These options 
include wired Ethernet, WiFi, WiMAX, 
ZigBee, Bluetooth, cellular wireless, 
satellite communications, POTS (plain 
old telephone system) and TCP/IP-based 
channels. Because these options are stan- 
dards-based, however, many developers 
will want to consider purchasing the ap- 
propriate hardware and software expertise 
and concentrate on building the applica- 
tion software. 

The communications option selected 
for an application will depend strongly 
on the application requirements. Wired 
connections are among the easiest to im- 
plement at the system level, but may not 
be practical for the actual field installa- 
tion. As aresult, many M2M systems im- 
plement a wireless connection, prompt- 
ing the common belief that M2M refers 








only to wireless implementations. Many 
M2M systems can be implemented with 
wired communications links, however, if 
device locations already possess or can 
readily accept wiring. It all depends on 
the application. 

When choosing a wireless communi- 
cations channel, the geographic range of 
the installed device base, the cost of the 
communications channel on a per-bit ba- 
sis, and regulatory requirements should all 
be carefully evaluated in the early stages 
of system development. Devices that clus- 
ter close together, as in a single building, 
are candidates for channels such as WiF1. 
When an application needs city- or coun- 
try-wide coverage, however, cellular tech- 
nology may be the appropriate choice. 

Developers should also be prepared 
to spend considerable effort at compliance 
testing when choosing a wireless commu- 
nications link. Governments will demand 
testing to show compliance with spectrum 
usage regulations. Protocol review boards, 
such as the Cellular Telecommunications 
and Internet Association (CTIA) will also 
require compliance testing to ensure a 
device’s compatibility with the network. 
Finally, the carrier service providers, such 
as cellular companies, will want testing to 
ensure a system will not affect service to 
other customers. As with the communica- 
tions hardware and software, developers 
may often choose to purchase the radio 
system elements and expertise and con- 
centrate on application development. 

The ability to purchase system com- 
ponents, which stems from the standards- 
based nature of M2M, is a key compo- 
nent of M2M’s potential. Smaller system 
development companies can buy rather 
than build the elements of their systems 
and then focus their efforts on leverag- 
ing their application expertise. Hardware 
and software development companies can 
focus on providing the system elements, 
meeting the needs of a wide customer 
base. The potential market is large enough 
for companies of all types to carve out a 
niche, including consulting, testing and 
system integration vendors. 

Ultimately, M2M will fundamentally 
change the nature of embedded systems. 
With communications capability readily 
and inexpensively integrated, there will 
be little benefit in having an embedded 
design that cannot communicate. Those 


that can communicate will gain an abil- 
ity to become part of something larger 
than themselves, enhancing their inherent 
functions with all the power that comes 
from the free exchange of information. 
And as the last decade’s experience with 
the Internet shows, the power of informa- 
tion can be transformational. @ 
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Machine-to-Machine 





Internet Protocols Ease 
Development Cost and Time 
for M2M Communication 


IP networks provide many benefits to users of M2M applications. 
Among the benefits are lower costs of infrastructure, operations, 
communications and support, and higher throughput, reliability, 
versatility and security. 


by Alan Singer 
ConnectOne 


achine-to-machine (M2M) com- 
NVI munication connects non-PC de- 

vices to the Internet and other IP- 
based networks. M2M 1s the fundamental 
technology of Post-PC, third-generation 
computing technology. It provides remote 
access over the Internet to non-PC devices 
such as point-of-sale (POS) terminals, 
fleet management or automatic vehicle 
location terminals, home appliances, data 
loggers, telemedicine devices, industrial 
controllers, utility meters, building con- 
trols, vending machines, SCADA devices, 
and much more. 

These devices—powered by embed- 
ded microcontrollers—number in _ the 
billions and are deployed throughout 
the world. Of course, not every device 
needs to communicate. But where there 
is a real business need, such as improv- 
ing productivity or lowering the cost of 
service, there’s justification to seek the 
most efficient or cost-effective means of 
communication. And, the Internet, which 
exists as a global communication back- 
bone, is the perfect medium for M2M 
communication. 





Using the Internet for M2M ap- 
plications takes much of the pain out of 
the process of collecting and communi- 
cating data. Fleet management systems 
track the inventory, route, tolls, location, 
mileage, engine RPMs, gas consumption, 
brake wear, tire pressure, stops, hours 
driven, etc. The process is done automati- 
cally, freeing the driver from extensive 
paperwork and the company from hours 
of tabulation, review and reconciliation 
of records. Twenty years ago, trucking 
and taxi fleets used radio frequencies in 
the VHF or UHF bands for communica- 
tions—a relatively expensive and analog 
system. Today, most fleet management 
terminals use [P-based cellular digital 
networks such as GPRS and CDMA2000 
for communication, which make it much 
easier to integrate data into an enterprise’s 
IT system. 

Market research indicates that M2M 
will be an active segment in the embed- 
ded market. According to the Wireless 
Data Research Group, between 2004 
and 2008, the market for hardware, soft- 
ware, professional and wireless network 


services for machine-to-machine commu- 
nications will grow at a 27% compound 
annual growth rate, from $9.3 billion to 
$31 billion. The Gartner Group predicts 
that by 2007, there will be between 100- 
160 million machine-to-machine connec- 
tions worldwide that use wireless mobile 
phone networks. According to the Focal 
Point Group, 880 million new M2M-en- 
abled devices will be produced annually 
by 2010. 


Where’s the Challenge? 

Device communication needs to be 
rock solid over a medium that is dynamic 
and unpredictable. The Internet is a rap- 
idly evolving and often unstable network 
environment. Methods of connectivity and 
communication and security protocols 
all change as technology advances. As 
new protocols are introduced, a device’s 
communication application needs to be 
updateable since most industrial devices 
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Communication options for |P-enabling an M2M application. 


have a long in-field product life. Remote 
updateability of the communication tasks 
is very important since many devices are 
located in hard-to-access locations or are 
mobile. The Internet offers the possibility 
to remotely update the device application 
unattended. 

Deployed devices can benefit from 
being retrofitted with IP connectivity. 
Elevators or vending machines installed 
more than 20 years ago, for instance, were 


not designed with any expectation of con- 
nectivity. To bring them online requires a 
solution that works with older, stand-alone 
technology. In addition, new devices need 
to be designed so that their application can 
be updated remotely and will not need ret- 
rofitting as the communication protocols 
advance. 

Obviously, enabling an existing or 
new design to include IP connectiv- 
ity requires that the solution fits within 


Open Systems Interconnection (OSI) Reference Model 


Upper Layers Lower Layers 
Application Presentation Session Transport Network Data Link Physical 
Layer (7) Layer (6) Layer (5) Layer (4) Layer (3) Layer (2) Layer (1) 





OSI Reference Model. 


Figure 2 


48 Five June 2006 


limited resources. To succeed, M2M 
devices depend on minimizing the time 
and level of engineering resources re- 
quired. The developer has tight fiscal 
budgets to meet. And, of course, the de- 
vices, particularly those based on older, 
stand-alone technologies, have limited 
computing resources (both processing 
power and memory). 


So, How’s It Done? 

As in the desktop world, connectiv- 
ity comes in a variety of formats. Simi- 
larly, depending on the technology avail- 
able in the specific region, connectivity 
might be achieved by dial-up, wireless 
modem, 802.11la/b/g wireless LAN, or 10 
and 10/l00BaseI Ethernet LAN. A global 
network of devices could quite conceivably 
involve a mixture of technologies based on 
those prevalent in the specific areas of the 
world where the various devices are located. 

There are five main alternatives for 
Internet-enabling a device: 

1. Buy or develop a TCP/IP software 
stack, or download shareware from 
the Internet and integrate the protocol 
stack into the host application. 

2. Buy a microcontroller that includes 
a bundled TCP/IP stack and runs the 
application. 

3. Buy a wireless modem with an em- 
bedded TCP/IP stack. 

4. Buy an Internet controller chip that 
offloads the Internet tasks from the 
MCU. 

5. Buy a device server to [P-enable a de- 
ployed device. 


Software libraries and shareware 
have long been available for developers 
who wish to add the TCP/IP protocol 
stack to their application as a “do-it-your- 
self” solution. Processor manufacturers 
today are bundling TCP/IP stacks with 
their silicon. Wireless modem manufac- 
turers also bundle support for TCP/IP in 
their modems. All these solutions require 
that you have some Internet programming 
expertise. On the other hand, Internet con- 
trollers and device servers enable custom- 
ers to use their current application and 
hardware with minimum or no redesign 
or reprogramming. Internet controllers do 
this cost-effectively, efficiently and offer 
maximum flexibility (Figure 1). 








The Protocol Factor 

The Internet is a loosely organized, 
international collaboration of autono- 
mous, interconnected networks that use 
TCP/IP as the protocol for host-to-host, 
peer-to-peer and peer-to-host commu- 
nication. Internet connectivity requires 
conformity with Internet standards by 
service providers who furnish access to 
the Internet, hardware manufacturers 
and software developers. 

Because the Internet is based on 
open standards (as opposed to propri- 
etary protocols), it offers the opportu- 
nity of a ubiquitous, low-cost medium 
for communication. With TCP/IP as 
the basic communication protocol, any 
company can develop a product that can 
interoperate and “talk” with products 
developed by other companies using 
Internet protocols. Thus, the Internet is 
the ideal medium for networking M2M 
applications. 

The Internet offers many methods 
to send and receive data, images, au- 
dio, video and other files. The Open 
Systems Interconnection Reference 
Model classifies communication pro- 
tocols, applications and physical media 
into seven layers, built one on top of 
the other. The model helps in under- 
standing networks and in developing 
products that can communicate with 
each other (Figure 2). 


TCP/IP Stack 

Most likely, an M2M device devel- 
oper will want to add the IP stack to the 
application, especially if the application 
uses a 32-bit processor. This option re- 
quires the developer’s organization to 
have or to subcontract embedded Internet 
programming expertise in order to avoid 
time and cost overrun and to ensure reli- 
ability in every network. If the user has 
some control over the network, if con- 
nections are predictable and if only one 
or a few TCP or UDP sockets are used, 
this is a relatively easy task. 

If, however, the device will be used 
on multiple networks where connectivity 
is unpredictable and not controllable, then 
the task is not so simple. An embedded 
design must be ruggedized to account for 
many different implementations of the 
Internet protocol standards or Request 
for Comments (RFCs) on servers located 
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Block Diagram of an example of an offload controller. 


in disparate networks, Internet Service 
Providers (ISPs), wireless operators and 
telecom operators. 

On an embedded hardware level, if 
the design does not use a 32-bit proces- 
sor, the host CPU may be underpowered 
to run the low-level TCP/IP protocols and 
upper-layer Internet protocols, security 
protocols (if needed) and the application 
simultaneously if there is a lot of data to 
send or if the processor is slow. It may be 
necessary to add additional memory to 
store the Internet protocols and add more 
buffers if the device does not have ade- 
quate memory resources and if more than 
one socket will be used. 

Taking these steps may require a to- 
tal hardware redesign in order to IP-en- 
able the product. It is not recommended 
to mingle the application with the Inter- 
net connectivity tasks because the dy- 
namic nature of the Internet requires fre- 
quent updating of the Internet protocols 
and configuration parameters, while the 
application tends to be stable and does 
not often require updating. In any case, 
a methodology must be implemented for 
safe remote updating of the Internet con- 
nectivity application. 

Using an off-the-shelf IP soft- 
ware stack or developing one’s own IP 
stack may require a lot of integration, 
customization, modification, testing and 
fine-tuning. This is especially tricky if 
upper-layer Internet protocols or Internet 
security protocols are required. In taking 


this route, developers must maintain the 
IP stack or have a maintenance contract 
with the software vendor. There is no 
guarantee that a stack that works in the 
lab will function reliably in every net- 
work around the world. If Internet con- 
nectivity is not the developers’ expertise, 
this is a very risky undertaking. Buy- 
ing a software stack is the least risky of 
these choices, but developers again need 
to maintain the protocols to provide the 
fastest support to their customers. 


Connectivity Processors, 
Modems and Device Servers 

Several manufacturers offer 
microcontrollers (MCUs) that include 
Internet protocols and are designed to 
run the application simultaneously. Bun- 
dled solutions may be adequate for some 
MCUs and Internet connectivity tasks, but 
mingling the Internet protocols and appli- 
cation may degrade CPU performance, 
depending on the microcontroller’s pro- 
cessing power and speed, the amount of 
data transmitted, the protocols used and if 
they are fully implemented. 

Because of limitations on internal 
chip memory and available processing 
power, on-chip Internet protocols are of- 
ten scaled down. It’s important to check 
that the customer’s desired protocols are 
supported and are RFC-compliant. Since 
most MCUs have limited internal mem- 
ory, it may be necessary to increase the 
target board’s memory. Finally, because 
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an MCU vendor’s expertise is in sili- 
con, not in Internet communication, the 
amount and quality of support for the IP 
stack may be limited. All this can add up 
to additional cost in terms of hardware 
and development time. 

Most major manufacturers of wire- 
less modems today offer modems with 
some implementation of the Internet 
protocol stack. These modems still re- 
quire developers to add Internet com- 
mands and configuration parameters 
to their applications. In any case, users 
must carefully evaluate the capability 
of the modem to perform the required 
IP communication tasks. Some [P-en- 
abled modems offer only one socket, 
while other modems include some up- 
per-layer protocols. 

Some wireless modems enable de- 
velopers to write their application on 
the modem processor. This may be fine 
if the application 1s a new one, but if the 
application already exists, it may not 
make sense to rewrite the application 
on the modem. Also, if the developer 
does not have Internet programming 
expertise, this is a time-consuming and 
inefficient choice. Most wireless mo- 
dems have a limited amount of memory 
and therefore include a minimal imple- 
mentation of the IP stack and may not 
be able to adequately buffer the data 
if there is a lot of data to send. Many 
wireless modems do not offer the user 
much flexibility or control over the con- 
nection. Therefore, they require a great 
deal of development work in order to 
assure reliable connectivity. 

An alternative to buying an [P-en- 
abled wireless modem is to use a stan- 
dard wireless modem with a wireless 
Internet adapter, to provide the IP con- 
nectivity. In this case, the customer can 
choose the most suitable wireless mo- 
dem from the many modem manufac- 
turers. This solution is practical when 
customers don’t want to change their 
current wireless modem or application, 
or if the functionality available on IP- 
enabled wireless modems does not meet 
their requirements. Since this solution 
is more expensive than an IP-enabled 
wireless modem, the Internet adapter 
must offer higher functionality. 


Offloading Connectivity 

Just as there are controller chips 
that offload certain functions from a 
host processor, such as VGA, DMA and 
interrupt controllers, an Internet con- 
troller offloads Internet communication 
tasks from the host processor. The role 
of the Internet controller is to mediate 
the connection between the host and 
the Internet via the physical medium. 
An Internet controller enables develop- 
ers to modify their existing design with 
minimal changes to the hardware and 
minimal or no changes to the applica- 
tion. They can use the existing proces- 
sor, memory and application, and just 
add a few commands to set the Internet 
configuration parameters and to activate 
Internet communications (Figure 3). 

An Internet controller offloads In- 
ternet connectivity tasks from a host pro- 
cessor, enabling it to exclusively and effi- 
ciently run the device application. When 
offloading the Internet connectivity, cus- 
tomers can use their current application, 
Operating system and remain focused 
on their area of expertise, which is the 
device itself. Offloading eliminates the 
possibility of having to update the oper- 
ating system, CPU and memory in case 
the customer wishes to add new Inter- 
net functionality or protocol support to 
the application in the future, which also 
eliminates the need for a major rewrite of 
the application. 

Because the Internet is a complex, 
dynamic and inconsistently implemented 
medium that is constantly evolving, an IP 
connectivity solution must be as dynamic 
as the Internet. It must be adaptable, sim- 
ple to use and maintain, yet sophisticated 
enough to deal with the Internet’s inherent 
inconsistency. The challenge to embedded 
designers is to develop a solution cost-ef- 
fectively and in a timely manner. 4 
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RTC Interviews 
Bill Kehret, CEO 


Themis Computers 


RTC: There seems to be a trend in the 
embedded computer industry to attempt 
to supply more and more complete sys- 
tems to OEMs, thus climbing up the food 
chain in hopes of increasing revenue 
with the additional parts of the systems 
supplied. Many companies have done 
this through acquisitions that allow the 
company to provide additional parts of 
systems. Do you believe the industry will 
continue to consolidate in this way? At 
what point do you think embedded com- 
puter suppliers will begin competing 
with their customers? Will a time ever 
come when there are only a small hand- 
ful of system makers and the traditional 
merchant board market will go away? 


Kehret: Vertical integration can add rev- 
enue for the supplier, reduce logistics over- 
head for the procurer and potentially reduce 
integration risk. If the procurer perceives 
that the supplier is adding value (“one stop 
shopping’), the relationship is strength- 
ened and the supplier may be granted a 
gross margin premium for his “commod- 
ity.’ The procurer, often a first-tier prime 
contractor or system integrator or OEM, 
adds value by using his application domain 
knowledge and, perhaps, proprietary hard- 
ware and software to create a differentiated 
system solution for the end customer. This 
mutually beneficial relationship usually 
means that the supplier is expected to re- 
main weakly differentiated from his com- 
petitors (remain a commodity supplier), 
ensuring that the procurer can continue to 
enjoy competitive pricing, from alternative 
suppliers. Problems arise when the supplier 
attempts to rebalance the value-add stack. 


Rebalancing the value-add 
stack between the embedded 
computing supplier and his first 
level customer, may reduce the 
total cost to the integrator’s cus- 
tomer, but it’s usually not in the 
integrator’s interest to have his 
value-add ratio reduced. That’s 
a nice way to say that he doesn’t 
welcome competition for his 
value-add ROI. Usually, this isn’t 
a problem, because the embed- 
ded computer supplier has a difficult time 
managing, selling and supporting a broad 
functional range of products, diluting his 
application domain knowledge. In other 
words, itis very hard for a broad-range sup- 
plier to have the competence to compete in 
the deep vertical niches that the traditional 
primes dominate. But, this is a two-way 
street. The primes have increasingly out- 
sourced much of their deep technology, so 
they are now obliged to rely on Integrated 
Product Teams (IPTs) to provide complete 
solutions for their end customer. 

Themis’ strategy is to add incre- 
mental value, as an Integrated Product 
Team leader, for mission-critical embed- 
ded computing. Our domain knowledge 
is kinetic and thermal management and 
resource management for server and I/O 
consolidation. The benefits for our cus- 
tomer, and the end user, are reduced life 
cycle cost of ownership. The key to conflict 
avoidance with the “prime” is to add value 
by reducing the cost of maintenance and 
technology refresh/insertion. We also 
scrupulously adhere to open standards, 
supporting and interoperating with all 
existing hardware and _ infrastructure 





software platforms. Themis does this by 
supporting and driving standards with such 
organizations as OMG and VITA. 


RTC: As we go to press, the European 
Union’s directive on the Restriction of 
Hazardous Substances (RoHS) will just 
be going into effect. It’s been speculated 
that it may cause problems as many leaded 
components go to end of life. This is caus- 
ing a lot of confusion in many areas where 
the RoHS regulations don’t necessarily 
apply but the availability of many compo- 
nents—but not all—is disappearing. How 
will the transition through mixed invento- 
ries with no clear distinction in part num- 
bers or even appearance, affect our indus- 
try? What work-arounds—if any—do you 
envision? Can programmable logic take 
up the slack? Will there be other advan- 
tages if that’s the case? 


Kehret: Themis’ enjoys market and prod- 
uct exclusions to RoHS. Having said that, 


Get Connected 
with companies mentioned in this article. 


www.rtcmagazine.com/getconnected 





June 2006 F:I88 53 


we are very focused on driving all new 
products to RoHS compliance. Where we 
have extended life cycle programs in place 
with key customers, we attempt to inven- 
tory complete kits, or completed assem- 
blies, to avoid the risk of mixing compo- 
nent technologies. We’re very concerned. 
Even certification by the best assembly 
houses in the industry can’t guarantee 
that improperly marked parts don’t enter 
the supply stream. Themis believes that a 
rapid transition to ROHS compliance is the 
best way to reduce the risk of mixed pro- 
cessing, and when combined with supplier 
certificates and segregation of kit and WIP 
inventories for legacy product, can provide 
the best overall level of risk management. 


RTC: VMEbus has been a mainstay in 
the military and aerospace markets for 
the past 20-plus years. And while there 
have been mid-life kickers to upgrade 
the technology, it would appear to be 
reaching the end of a long run. Do you 
believe VME can continue to be a viable 
approach throughout and beyond this 
decade? Will other, newer standards 
such as VITA 46, 51 and others rapidly 
displace VME? On what timetable do 
you expect to see this happening? Why? 


Kehret: The long life cycles of military 
platforms pretty much guarantee a long 
exit ramp for VME64 backplane-compat- 
ible technology. The fact that new military 
programs are still designing-in parallel 
bus VME backplane ecosystems, tells me 
that the eventual transition will take more 
than ten years. VITA and the VME eco- 
system vendors have worked hard to adapt 
best of breed interconnect technology for 
use in mission-critical environments. So 
far that has resulted in a divergence of se- 
rial protocols, which complicates the life 
of system integrators. The benefit to inte- 
grators and end customers alike is that the 
shelf form-factor doesn’t have to change. 
The war-fighter’s 6U shelf or hotel space 
doesn’t have to be re-architected. 

An added architectural benefit is that 
equipment shelves can be interconnected, 
locally and remotely, with low latency se- 
rial links. VITA 46 (VPX) acknowledges 
that switched serial protocols, (in our es- 
timation PCIe), will be the dominant LSI 
building block interconnect system, at the 
I/O level. Given increasingly high levels 
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of chip-level integration (SOC), PCle and 
some, yet other serial bus-du jour, will 
quickly dominate at the board edge, even 
for mezzanine modules (VITA 42, XMC). 
Power density scaling pretty much demands 
conduction cooling and opens the door for 
flowthrough liquid cooling and thus VITA 
48 VPX-REDI (Ruggedized Enhanced 
Design Implementation). What this means 
for VME is that the old Euro Standard 6U 
form-factor has lots of life left in it. This is 
important because in both 3U and 6U for- 
mats it occupies an important rung on the 
volumetric modularity ladder. Modularity, 
thermal and kinetic management are where 
VME shines, and these new specifications 
extend the commodity reach for this popu- 
lar embedded computing space. 


RTC: The VME Standards Organiza- 
tion recently started up a committee for 
VITA 56, which is defined as a front- 
panel access, live insertable mezzanine 
card that will be available in versions 
to handle full military shock and vi- 
bration requirements. The dimensions 
and capabilities are somewhat simi- 
lar to the AdvancedMC. Do you think 
there’s room for two similar mezzanine 
form-factors—one for rugged applica- 
tions, the other for more benign jobs? 
For that matter, as for example, the 
military moves to a more network-cen- 
tric structure, might systems based on 
ATCA and perhaps in the not too dis- 
tant future, MicrofCA, migrate from 
the commercial world to some military 
applications—particularly shipboard 
and land-based? Does Themis have 
any plans to join the ATCA/MicroTCA 
marketplace? Why or why not? 


Kehret: Well, we're not particularly 
pleased to have board edge finger connec- 
tors, sans connector shells, migrate into 
the VME space. This is a giant step back- 
ward in connector integrity. Most of the 
defensive writing we’ve seen focuses on 
the surface mount compression connection 
reliability, but that’s not the focus of our 
concern. Rather, we are concerned with 
the mechanical stress path, which isn’t me- 
diated by strong mating connector shells. 
Fortunately, the VPX wafer-style con- 
nectors do address this requirement. The 
problem with using less robust connectors 
in open standards is that there is no control 


over the way independent board vendors im- 
plement the standards, or control fingerplate 
tolerances. The relatively lower mass and 
shorter stress paths of mezzanine cards does 
move the first resonance mode frequency 
up, significantly, a mediating benefit for this 
application. However, lateral stress paths do 
cause blade-mating surfaces to scrub, reduc- 
ing the reliability of the mated contact. 

Connector reliability is a big deal for 
Themis and its customers. A major part of 
our program to mitigate connector reliability 
concerns is a regimen of stress-to-fail and 
Highly Accelerated Life Testing (HALT), 
during the design phase and 100% vibra- 
tion screens during manufacturing, both at 
the board and system levels. Further, we do 
continuous life testing, with sampled Highly 
Accelerated Stress Screening (HASS), to en- 
sure quality and reliability of our products. 
Board-level interconnects both add overhead 
cost per function and reduce reliability. Our 
test regimes help improve reliability, but 
there’s little to be done for the increased 
component cost of mezzanine modules. 

Regarding ATCA and related ecosys- 
tems, Telecommunications and Data Com- 
munications markets have demonstrated an 
appetite for this large form-factor standard. 
As implied above, it’s another, now estab- 
lished, rung on the shelf-level volumetric 
modularity ladder. Themis has products 
under development that will push comput- 
ing density in these form-factors. We’re less 
excited about MicrolCA, for the reasons 
elaborated above, but we leave the mar- 
ket to sort this out. We do have aggressive 
plans for the AMC form-factor, and our ex- 
tensive Symmetric Multiprocessing (SMP) 
and chip-level multi-threading experience, 
at the hardware and OS levels, should make 
us a strong competitor in this market. 


RTC: As processors and attendant systems 
have become increasingly powerful, there 
has been a need to develop clever cooling 
techniques. Themis has been a leader in 
a variety of techniques over the years in- 
cluding mist cooling. What are the chal- 
lenges today, and how is Themis and other 
companies addressing the problems? 


Kehret: Thanks for asking! We think there’s 
a lot more life left for air-cooled systems, 
and conduction cooling is a key to extending 
the life of air-cooled systems. The question 
is, Where to put the heat sink? We like to use 








as much of the board surface area as possi- 
ble for the heat exchanger (heat sink) on our 
high-power-density boards. Actually, we’ve 
been doing this for more than ten years. 
We pioneered four-way SMP servers in the 
VME form-factor, using discretionary pins 
and 12V power to get the power into the slot 
and wall-to-wall heat sinks that slide into the 
VMEboard card guides. Interestingly, VITA 
46/48 takes a page out of this book and adds 
liquid flowthrough cooling in the bargain. 
We've been doing flowthrough-cooling for 
several years too, with our Slice modular 
switched computing, but that’s beyond the 
scope of this VME discussion. 

These “unitary” heat sinks create a whole 
separate set of problems for the mechanical 
designers. The naive design approach runs 
afoul of thermal, as well as shock and vibra- 
tion stress on the chip-to-board interfaces, 
regardless of bonding technology. Themis’ 
extensive stress screens, continuous life test- 
ing and accelerated life testing are part of our 
“customer commitment” to quality and reli- 
ability, an essential gatekeeper and counter- 
poise to design innovation. 

The trouble with bus and board open 
standards is the freedom that vendors have 
for interpretation. ATCA especially suffers 
from all the choices of how to use cooling air 
channels on the board. Our industry likes to 
go crazy with modularity. The down side of 
this is that the air channels get messed up by 
a proliferation of AMCs. Little if anything is 
done to control airflow across down stream 
modules, and the AMCs block much of the 
remaining surface area available for heat ex- 
changers. At the level of the shelf, or chas- 
sis, there is very little effort made to regulate 
the available slot-to-slot inlet air pressure, 
so much of the touted power density head- 
room is fiction. These problems put a special 
burden on the system integrator and can, in 
part, be mitigated by the ecosystem vendor, 
but it means the vendor that builds the high- 
power-density boards also needs to design 
the shelf/chassis and its cooling system. 

But Themis isn’t only a story about high- 
performance, high power density computing. 
We're hard at work on several computing ar- 
chitectures that push high performance way 
down the thermal power scale. That should 
have benefits for Space, Weight and Power 
(SWAP)-challenged applications. 


RTC: VME has been a staple in Themis’ 
embedded computer strategy for many 


years, serving the communications, in- 
dustrial and military markets. However, 
the business is changing radically. What 
was once the number one VME supplier 
(Motorola) has gone end-of-life on many 
of its designs, and according to rumors, 
has backed off on most new development. 
What was the number two maker of VME 
hardware (Force Computers) abandoned 
the architecture and was acquired by 
Motorola—the combined company is now 
focusing significantly on ATCA and com- 
munications strategies. While the VME 
market remains relatively strong, 1) can 
it continue to be a viable force in the em- 
bedded computer board and subsystem 
market in light of growing competition 
and 2) will a critical mass of suppliers 
continue to offer the variety of products 
required? 3) Does there continue to be a 
market for VME outside the military? 


Kehret: OK, as you suggest, there are really 
several ideas in play regarding the longevity 
of the “VME” market. We believe that VME 
is more than VME64 or any other particular 
flavor. In the broadest sense, I think the mar- 
ket thinks of VME as the 6U/3U Euro Stan- 
dard card and shelf packaging dimensions. 
Backplane compatibility, of course, makes 
all the difference in terms of a variant’s sur- 
vivability. The splintering of 6U/3U, with the 
emergence of cPCI, was really the industry’s 
reaction to a lack of responsiveness by the 
standards groups to extend and refresh back- 
plane protocol and connector-based I/O con- 
straints. In any case, the 3U/6U VME eco- 
system is, for all practical purposes, here to 
stay. The viability of particular specification, 
and “‘dot’’-level variants, will be worked out 
by market forces. The important point 1s that 
VITA and other standards organizations, 
through their endorsement, are providing 
enough flexibility in the ecosystem for it to 
mutate and survive. Whether it thrives is up 
to the embedded community, their ingenuity 
and perseverance. 

The question of critical mass in the 
ecosystem is largely answered above. We 
think the VME ecosystem will actually 
make gains at the expense of cPCI and 
ATCA, in non-military markets. Will Mi- 
cro[CA steal market share from 3U VME? 
That’s hard to say. As I’ve discussed above, 
the mechanical/electrical integrity of the 
AMC board edge connectors are seriously 
compromised for high shock and vibration 


applications. I rather suspect there will be 
parallel universes for ATCA, its ecosystem 
and the evolving VME ecosystem. Once 
in a while there will be some design win 
crossovers, with an attendant flurry of con- 
cern, as expressed in the trade media, but 
for the most part, the two ecosystems will 
remain vital standards for their respective, 
highly differentiated markets. 

Where do VME customers go when 
they leave the bus-and-board market? 

While you didn’t ask this question, it’s 
a logical extension. The answer is, more 
blade servers and all-in-one boxes. Com- 
mercial blade servers are outperforming 
ATCA at lower price points, and while 
they are no match for VME—in terms of 
robust service—they are great for server 
consolidation. Themis has been working 
on its Slice Switched Computing initiative 
for several years. This product family has 
the advantages of VME, in terms of cost 
of ownership, thermal and kinetic man- 
agement, but also the benefits of a larger 
format board (think rack slice), vastly im- 
proved cooling and extensibility, while 
matching or winning on SWAP. The mar- 
ket has learned that COTS is good. VME- 
like attributes at commodity prices, should 
give Themis an “unfair” advantage in the 
server consolidation markets. Slice and 
Slice-Lite are our answers for those who 
would leave the bus-and-board fold. 


RTC: Switched fabric technology ap- 
pears to be something that’s going to 
happen—it’s just a matter of when. 
Several variants from ATCA and AMC 
to other approaches such as VITA 
36 and VITA 41 are emerging, as is 
CompactPCI Express. Do you see any 
of these emerging as critical technolo- 
gies over the next several years? Which 
one (s)? How will the emergence of hy- 
brid and non-VME products impact 
the backplane and packaging business? 
Will the proliferation of many different 
architectures cause the lack of critical 
mass and deterioration of the market? 


Kehret: VITA 41 is an incremental ap- 
proach to serial fabrics. While its use of 
a high-performance wafer-style connector 
is a good idea, the new connector comes 
at the expense of board footprint and rout- 
ing channels. It’s a useful evolutionary step 
for those integrators who need some serial 
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connectivity but need to preserve their leg- 
acy, proprietary VME boards. For now, we 
think it’s a rather narrow, niche market. 

VITA 36 is an important complement to 
the PMC spec for those who need rear con- 
nector I/O. Those who need it, usually because 
the front panel is too crowded, or those who 
want cable management, really get a benefit. 
Often the VME baseboard vendor also con- 
trols the other end of this link, so it’s not much 
of a problem for the industry at large. 


RTC: Themis goes back a long way in the 
embedded computer industry, and over 
the years our editors have always enjoyed 
your tremendous insight and perception 
of the industry. Embedded computer 
technology has continued along a growth 
curve following Gordon Moore’s Law. 
Advances such as multicore processors 
and developments in programmable logic 
continue to add speed and density. Can 
you give us any sense of what applications 
you envision for embedded computers, 
say, a decade away? Twenty years? 


Kehret: [ll offer the usual caveats about 
technology forecasting. Technology fore- 


casting, using auction and pseudo mar- 
kets, is a good way to improve accuracy 
through community participation, but 
individual prognosticators are often way 
off base. Still, in the spirit of your ques- 
tion, I'll take a cut at it. An architectural 
trend that is gathering momentum in 
telecom markets as well as semiconduc- 
tor design can be called “low and wide.” 
I believe George Gilder popularized the 
paradigm, calling it “low and slow” or 
“wide and weak,” some years back. That 
certainly makes sense for the all-optical- 
networks (high channel count and low 
power), but the paradigm is starting to get 
legs in computing architectures too. Sun 
Microsystems has consistently beat the 
big semiconductor players to market with 
multicore designs and its recent “Chip 
Multi-Threading” (CMT) offerings are 
a good example of what can be done to 
reduce power per thread, while retaining 
significant transaction rates. 

Optical interconnects, optical switch- 
ing and storage can break the planar pro- 
cessing bottleneck by going to 3D and back 
again. Both technologies have a transforma- 
tional potential that may change computing 


density far faster than Moore’s Law, and in 
so doing, change the application landscape 
for computing (as if it wasn’t changing fast 
enough already). In any case, these trends are 
very hopeful because they portent increases 
in computing density/power (transaction rate 
per unit volume per watt). I’m not going to trip 
out about fly-size UAVs, but I do think these 
trends will enable huge rates of advancement 
for diagnostic medicine and for personalized 
pharmacological interventions, for disease 
control and management and for virology. 
With the threat of biological terrorism, that’s 
a comforting thought. I also look forward to 
talking and maybe just “thinking” about how 
to interface with high-mobility computing 
devices. These advances, along with retinal 
scanning and heads-up displays, ought to take 
the pain out of our neck for everyday com- 
puter users in the next five to ten years. 

As you might imagine, we plan to be 
part of this exciting evolution and there will 
be plenty of spinout technology to benefit 
our more traditional, embedded markets. 

You will have to ask a real “seer” about 
the twenty-year horizon! @ 


Join a winning team...in a dynamic environment 








Wed like you to be part of our winning team. 
Work with some of the hottest storage 
technologies in an exciting, team-oriented 
environment. SBE designs and provides IP-based 
storage networking solutions for an extensive 
range of business critical applications, including 
back-up and disaster recovery. 


We currently have openings for the following positions: 


Regional Sales Managers— 

Eastern, Northeast, and Southwest territories 

RSMs must have 7+ years experience in selling complex 
technology products to OEMs and managing major, strategic 
accounts. 
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Senior Software Engineer 

Ideal candidate must have 7+ years of software development 
experience and strong knowledge of Linux kernel-level 
programming and debugging. 


Software Engineer 

Responsible for assisting in the development of a servlet- 
based web application, using Java and Apache; Linux system 
administration and scripting skills required to assist In the 
maintenance of a script-based build-and-release system. 

2+ years software development experience a must. 


Product Marketing Manager 

Responsible for developing and implementing product strategy 
for new and existing storage products. 5-7 years product 
marketing experience in storage and/or networking industry 
required. 


Our compensation package includes very competitive salaries 
and benefits (medical, dental, 401K) plus stock options, so that 
all employees will share in our success. 


If you are interested, please send resumes with salary 
requirements via fax to 925.355.2041, email to resumes@ 
sbei.com, or mail to SBE, Inc., Attn: Human Resources, 4000 
Executive Parkway, Suite 200, San Ramon, CA 94583 
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See for yourself how Microsoft Windows Embedded can help 


you optimize device design, enhance reliability and integration, 
and improve time to market and development costs. 
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Register now at www.rtecc.com 


Enter a World of Embedded Computing Solutions 

Attend open-door technical sessions especially designed for those developing computer systems and 
time-critical applications. Get ahead with sessions on Embedded Linux, VME, PCI Express, ATCA, DSP, 
FPGA, Java, RTOS, SwitchFabric Interconnects, Windows, Wireless Connectivity, and much more. 


Your Resource Opportunity 
Exhibits arranged in a unique setting to talk face-to-face with technical experts. Table-top exhibits 
make it easy to compare technologies, ask probing questions and discover insights that will make a 
big difference in your embedded computing world. Join us for this complimentary event! 
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MILS Middleware for 
secure Distributed Systems 


Separating operating system services from the kernel allows for a small 
and very secure kernel. The resulting middleware layer can then be run 
so that it cannot violate the kernel and can thus be used to implement 


multiple levels of security. 


by Gordon Uchenick 
Objective Interface Systems 


In secure systems, trust is distributed among those layers. 
When security components are first assembled into layers, 
their resulting security properties can be determined and those 
layers then constructed into systems. This is an important element 
of the certification and accreditation process needed for secure 
systems. The Multiple Independent Levels of Security (MILS) ar- 
chitecture was designed to preserve the integrity of the layers in a 
complex system and to tightly control interaction among them. 
This architecture has three layers: the separation kernel, 
the middleware and the applications. The separation kernel is 
very small and it’s simple to make it robust and evaluatable. It 
is also well understood in the MILS community. In contrast, the 
MILS middleware layer is extensive and much more complex, 
especially in distributed systems. The purpose of this article is 
to explore that layer. MILS middleware functions for distributed 
systems can be grouped into three categories: 
e Operating system (OS) services 
e Partitioning communications system 
¢ Network middleware 


C omplex systems are designed and implemented in layers. 


Operating System Services 

In traditional architectures with monolithic kernels, system 
services such as device drivers, memory management, I/O primi- 
tives, file systems and network protocol stacks are part of the 
kernel and run in privileged (supervisor) mode. These services 
are not part of a MILS separation kernel—they are moved to the 
middleware layer where they run in unprivileged (user) mode. 





Get Connected with companies mentioned in this article. 
Q" www.rtcmagazine.com/getconnected 


This difference is important. When OS service code runs in 
privileged mode, there is no restriction on what it can do. Privi- 
leged code can potentially violate the security policy. Therefore, 
rigorous examination and testing are required to verify that the 
code does not violate the policy, even indirectly. The level of 
effort required to perform this verification escalates rapidly as 
code size and complexity grow. The MILS separation kernel rig- 
orously enforces four fundamental security policies: data separa- 


Application 


Network Protocols 
& Drivers 


MILS 





MILS Network Implementation. 
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tion, control of information flow, periods processing and damage 
limitation. Because MILS OS services run in user mode, code 
that was previously able to violate these policies is now strictly 
subject to them. Evaluation and certification are more straightfor- 
ward and effective. 

MILS OS services can be implemented in a number of use- 
ful ways. They can become part of guest operating systems, re- 
side in dedicated partitions, or be organized as minimal libraries. 
A system can use all of these implementations simultaneously. 

The industry paradigm of “reuse, don’t reinvent,” makes it 
probable that the application code in a MILS partition was origi- 
nally developed for another OS such as Windows, Linux, or Vx- 
Works. An adaptation to their hardware abstraction layer (HAL) 
can cause these operating systems to “see” the separation kernel 
as the processor, becoming the separation kernel’s “guest.” A 
guest OS runs in user mode MILS partition, working only with 
the memory and constrained to the CPU time budgets that have 
been configured for that partition. This symbiotic relationship be- 
tween the minimal MILS separation kernel and a fully featured 
guest OS has several benefits: 

e Protects investment in existing software 

e¢ Familiar API and development environment for new appli- 
cations 

e Enables unit testing on commodity hardware 





The Partitioning Communications System (PCS) 
intercepts data before it reaches the network 
protocol stack (red) where it is encrypted before 
being placed into a less trusted environment. 


A guest OS is larger and more complex than the MILS 
separation kernel. To manage schedule risk and reduce eval- 
uation cost, partitions that include a guest OS should process 
a single level (e.g., secret vs. top-secret) of classified data. 
If a single CPU processes both secret and top-secret data, 
then one partition should contain the secret data and another 
partition should contain the top-secret data. Each partition 
is then “system high.” Evaluations of the secret and the top- 
secret partitions are independent because the high robust- 
ness MILS separation kernel can be trusted to keep them 
separate. A medium robustness evaluation for each applica- 
tion partition 1s appropriate. Medium robustness is approxi- 
mately Common Criteria Evaluation and Assurance Level 4 
(EAL4). Evaluation at this level is practical and achievable 
for large code bodies. 

In the MILS architecture, a network protocol stack is 
middleware. Network protocol stacks often require their own 
execution context. Isolating network protocols into their own pri- 
vate partitions has several advantages: 
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e Facilitates use of the stack by multiple applications 
e Protects the applications and the stack from each other 
e Makes stack evaluation independent of application evaluation. 


Existing distributed application code bases can be most ef- 
fectively leveraged if the network API, such as the socket () 
interface, is not changed. When the socket API and the network 
protocol stack reside in the same partition, send() nominally 
copies outbound data to network buffers and then passes control 
to the stack’s lower layers. A MILS socket library, illustrated in 
Figure 1, uses the separation kernel’s facility for controlled infor- 
mation flow to send outbound data to the network protocol stack 
in its own partition. Inbound data is handled similarly but in the 
opposite direction. 

Some partitions may contain multi-level applications or 
security functions such as downgraders or guards. High ro- 
bustness evaluations, approximately EAL6, are appropriate for 
these partitions. If any code in a partition must be evaluated at 
high robustness levels, then all code in that partition must be 
evaluated at high robustness. Cost and risk management dictate 
that these partitions contain the absolute minimum amount of 
code possible. A reasonable approach is to organize OS ser- 
vices into a minimal library to be used in multi-level security 
(MLS) partitions. 


Distributed System Threats 

Communication exposes data to significant threats to confi- 
dentiality, integrity and availability. The sources of these threats 
are vulnerabilities in the network components that connect com- 
puters: protocol stacks, interface drivers, switches, routers, etc. 
Procurement policy mandates the use of COTS network com- 
ponents, creating a paradox. If these components provide any 
protection at all, it has typically been implemented as a fail-first/ 
patch-later afterthought, fraught with well-known vulnerabilities. 
We are required to place data in the custody of network compo- 
nents that were designed to transport data, not to protect it. 

Network protocol stacks such as TCP/IP are large and com- 
plex. They can be attacked in many ways. CERT lists hundreds 
of vulnerabilities in TCP/IP that are child’s play to a motivated 
attacker. The typical reaction to stack vulnerabilities is to apply 
encryption at the link level. This is not always effective. Applica- 
tion or stack vulnerabilities can cause encryption to be bypassed. 
Even when applied, encryption suffers from key management 1s- 
sues that limit its effectiveness. While it is a necessary first step, 
encryption is not enough to completely protect data while it is 
in transit. 


The Partitioned Communications System 

The previously defined paradox has a solution. The separa- 
tion kernel’s security policy enforcement can be leveraged to in- 
terpose a trustworthy protection function. This protection func- 
tion is the Partitioning Communications System (PCS), MILS 
middleware that is on the road toward a high robustness evalua- 
tion. The PCS is interposed between the application and the net- 
work protocol stack as illustrated in Figure 2. 

A straightforward way to understand the PCS is to view it as 
a super virtual private network (VPN) that enforces security poli- 








cies On transmissions between MILS nodes. The PCS encrypts 
the data before it reaches the protocol stack. The separation 
kernel’s guaranteed control of information flow ensures that en- 
cryption can’t be bypassed. Unencrypted (“red”) data that could 
have been diverted to an unauthorized recipient by a vulnerable 
protocol stack is now safe. The data is encrypted (“black”’) before 
it was ever put into the custody of code that is not as trustworthy 
as the PCS. 

An attacker can still gain control over the other end of a cir- 
cuit. IP and MAC addresses can easily be forged. A related attack 
is to kidnap the TCP port number. When this happens, we could 
be blindly sending classified data to the wrong program just be- 
cause it is running in the right computer. The PCS counters these 
threats by strongly identifying the other computer, application 
and application instance before allowing data to flow. 

The PCS keeps data with multiple security levels and Com- 
munities of Interest (COIs) robustly separated. DoD networks of- 
ten have multiple physical links connecting the same endpoints. 
Each link is dedicated to a unique classification level or COI to 
effect separation. These “air gaps” work, but they are costly and 
awkward. Human intervention is required to downgrade data and 
manually move it between networks. When the PCS is interposed 
in the data path, the separation of data with multiple levels and 
COls on a single link is trustworthy. It is no longer necessary 
to have multiple links just to guarantee data separation. Auto- 
mated downgraders can move data between levels. When human 
downgrading or verification is required, data flows faster because 
single COI networks can be physically interconnected without 
introducing additional threats. 

In addition to separation, the PCS enforces bandwidth al- 
location among the applications sharing a physical channel. Use 
of the channel by one application can’t affect the availability of 
that channel to any other application, guaranteeing Quality of 
Service, a requirement for robust distributed systems. 

Evil intentions can be accomplished by blocking communi- 
cations. Flooding a network with packets implements a Denial of 
Service (DOS) attack, equivalent to jamming radio frequencies. 
The PCS limits the amount of traffic that can be sent to an ap- 
plication to what is configured in the security policy, reducing a 
DOS attack’s damage potential. 

Unauthorized use of authorized channels can leak sensitive 
data. By controlling message length and/or timing, a subverted 
application can signal information completely unrelated to what 
is being transmitted. These unintended data paths through the 
system are called Covert Channels. The PCS can be configured 
to make all message fragments the same size and to occur at 
constant time intervals, countering that threat. 

The PCS is responsible for loading and running dynamically 
invoked applications. Before loading, the PCS verifies that the 
code comes from an authorized source and hasn’t been modi- 
fied. This validation blocks viruses, worms, Trojan horses and 
other subversions. The PCS also constrains the applications that 
it loads to budgets for memory, CPU time, kernel object alloca- 
tions and inter-process connection privileges. 

The initial crack in a distributed system’s defenses can often 
be caused by forcing error conditions that cause nodes in the sys- 
tem to run with different versions of the overall security policy. 
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Each node is correctly enforcing its local policy, but inconsisten- 
cies or incompatibilities between two or more nodes can lead to 
system-wide security compromise. The PCS verifies that secu- 
rity policy configurations are consistent before data is exchanged. 
This protection is provided for both centralized and federated 
policy distributions. 


Network Middleware 

The conversion path to MILS for CORBA, DDS and Web 
Services is straightforward. Guest operating systems and the 
MILS socket library facilitate porting from the platforms where 
these distributed application foundations are currently supported. 
Typical implementation is a library to be linked with the appli- 
cation code. Therefore, application code and the libraries that it 
uses reside in the same partition. 

These application foundations all have some security fea- 
tures dealing mostly with authentication and access control. They 
all have the same weakness: the application and the middleware 
security functions reside in the same partition. Because they are 
in the same address space, the security functions can’t be pro- 
tected from an errant or malicious application. 

Even with this weakness, applications using these founda- 
tions and their security features on a MILS platform with the PCS 
are more secure than on a traditional Operating System platform. 
The middleware security functions operate as they did in their 
original environments. The MILS platform counters additional 
threats that CORBA, DDS, or Web Services security functions 
do not even consider. Protecting CORBA, DDS, or Web Services 
security functions from the applications that use them is a topic 
for further research. 

The key benefit of MILS, dramatically reducing the amount 
of security-critical code so that we can dramatically increase the 
scrutiny of that code, is just as relevant to middleware as it is to 
the separation kernel. By leveraging the separation kernel’s guar- 
antees of data isolation and control of information flow, the total 
complement of software required to implement a distributed sys- 
tem can be divided into a set of independent components. These 
components are protected from each other, resulting in a more 
robust system. 

The key benefit of MILS also applies to end-to-end security 
policy enforcement for distributed systems as implemented by the 
PCS. The amount of security policy enforcing code is dramati- 
cally reduced. Systems are more robust and resilient to attack. 
The MILS architecture also reduces distributed system project 
cost and schedule risk for component evaluations and certifica- 
tions as well as for accreditation of the system as a whole. @ 


Objective Interface Systems 
Herndon, VA. 
103-295-6500. 
[www.ois.com]. 
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Multiband Transceiver Platform Boosts 
Performance, System Density 


In high-bandwidth data recording and playback applications such as 
radar, direction finding, SIGINT, telemetry and commercial wireless base 
stations, fitting more performance into a smaller space is always a plus. 
A new real-time multiband recording and playback transceiver develop- 
ment platform from Pentek makes available an extremely wide range of 
signal processing technology and resources in a compact space. 


, Occupying a single VME slot, the RTS 2504 consists of the 
i 4205 I/O Processor and the 7140 Dual Digital Up/down-con- 
_—T verter PMC/XMC Module. The RTS 2504 
system is a fully programmable develop- 
ment platform with a 1 GHz G4 PowerPC 
and multiple Virtex [/Virtex-II Pro 
FPGAs. It features real-time recording 
and playback to JBOD disk arrays at 
up to 160 Mbytes/s. Highly scal- 

able, it has from 2 to 40 trans- 

ceiver channels, dual 14-bit 105 
MHz A/D converters, dual 16- 
bit 500 MHz D/A converters, 
digital down-converters and optimized GateFlow FPGA DSP functions. 
Interfaces include Fibre Channel and 100 BaseT Ethernet ports. 
The RTS 2504 will be available in the third quarter of this year. 
Pricing starts at $26,995. 


Pentek, Upper Saddle River, NJ. (201) 818-5900. [Www.pentek.com]. 









SBC Controls Large Area, High-Resolution LCD 
Displays 

Large area, high-resolution TFT LCDs are used in applications 
such as digital signage and POI/POS, as well as gaming, medical and 
industrial displays. To design the subsystems that control these displays, 
engineers need the right combination of features, which are provided by 
a new EBX form-factor SBC from Apollo Display Technologies. 


The Galaxy SBC features either the 400 MHz or 650 MHz Intel 
ULYV Celeron processor and up to 512 Mbytes of SO-DIMM in a com- 
pact, 8-in. x 5.75-in. form-factor. The fanless design minimizes power 
loss and heat buildup. Four versions are available: a Best Performance 
version with an Intel 815 chipset for simple 2D, two High-Performance 

versions with M6 or M7 ATI graphics and 16 Mbytes of 
graphics memory, and a High-End Performance 
version with M9 ATI graphics and 
64 Mbytes of graphics memory 
for high-end 3D. This version 
includes driver support for dual 
display mode, with separate con- 
tent per display, and supports por- 
trait/landscape display modes. 








The Galaxy supports popular oper- 
ating systems such as Microsoft Windows, Win CE and Linux. Deliv- 
ery is typically 6-8 weeks. Pricing for the Best Performance version is 
$232.07 in production quantities of 10K units. 


Apollo Display Technologies, Ronkonkoma, NY. (631) 580-4360. 
[www.apollodisplays.com]. 
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FPGA-Based PC/104 Processor Card Provides 
Development, Deployment Options 


High-performance signal processing sys- 
tems, like those used for signal intel- 
ligence and software radio, increas- 
ingly depend on FPGAs to handle 
specialized processing functions. A 
new PC/104 card stack from Nallatech 
takes advantage of multiple FPGAs to 
handle processing and system manage- 
ment, while providing expansion slots for 
additional system resources. 


The BenNUEY-PCI-104-V4 can sup- 
port multiple analog and digital I/O interfaces 
and multiple memory types, as well as up to seven 
FPGAs, in a single PC/104 stack. Three DIME-II expansion slots let 
engineers optimize system resources to meet processing, memory and 
I/O requirements. The BenNUEY-PCI-104-V4 comes with an onboard 
Xilinx Virtex-4 FPGA. Up to six additional FPGAs are available on the 
DIME-II expansion modules, which support Virtex-4, Virtex-II Pro and 
Virtex-II FPGAs. 


Applications requiring data storage will benefit from the 16 Mbytes 
of DDR-2 SRAM connected to the Virtex-4 FX user FPGA, which can 
be used for storing algorithmic data or for buffering data around the 
multi-FPGA architecture. An Ultra-SCSI connector provides digital 
I/O. Linux and Windows are supported, and FUSE FPGA Computing 
Runtime Software is included. Pricing for the BenNUEY-PCI-104-V4 
starts at $19,995. 


Nallatech, Glasgow, Scotland. +44 (O) 1236 789518. 
[www.nallatech.com]. 







Multi-Processor VME Board Provides High 
Throughput, Low Latency 


A new multi-processor VME board from Cornet Technology pro- 
vides the high throughput and low latency needed to support a wide 
variety of intricate military/aerospace 
applications that require multi-process- 
ing power. 


The Celero CVME-7410 features up to 
four independent PowerPC nodes linked by 
a PCI bus on a single VME board to help 
designers handle complicated processing 
functions, such as image processing, digital 
signal processing, sonar control and radar 
control. The board supports LINUX and 
is the first VME board to support the 
VxWorks Safety-Critical platform. 
This enables system designers to de- 
velop systems that are compliant with 
ARINC-653 and DO-178B, which are recommended and often man- 
dated by military and avionics programs. 

The Celero VME-7410 will be available for shipping in the third 
quarter of this year. Pricing starts at $14,995. 


Cornet Technology, Springfield, VA. (703) 658-93400. 
[www.cornet.com]. 







Motion Control Chip Has Field-Oriented Control 
A new motion control IC includes field-oriented control (FOC) 
capability, a computationally intensive technique for greater motor ef- 
ficiency and higher top-end rotation speed in applications with high- 
performance, low-cost brushless motor amplifiers. The compact and in- 
telligent single-axis MC73110 from Performance Motion Devices also 
delivers a wide range of usable motor speeds along with 
numerous monitoring and control features. 






FOC capability gives a speed 
advantage over sinusoidal com- 
mutation in brushless DC motor #¢ 
systems. It also offers a signifi- 
cant improvement over standard 
variable-speed drive techniques. 
Utilizing an on-chip A/D converter, 
the MC73110 digitizes analog current 
feedback and provides multiple current 
control methods to maximize motor performance. The chip can also 
perform Hall-based and sinusoidal commutation. Other features include 
PID velocity loop, PI current loop compensation, trajectory generation, 
encoder input and Hall sensor input. Analog or digital command input, 
profile generation and six-signal symmetric PWM (pulse width modu- 
lated) waveform generation are also standard. 

The MC73110 can be operated as a stand-alone intelligent motion 
IC or via serial commands as a programmable axis controller. It is pack- 
aged in a compact 64-pin TQFP and operates from 3.3V. Prices start at 
$18 in OEM quantities. 

Performance Motion Devices, Lincoln, MA. (781) 674-9860. 
[www.pmdcorp.com]. 
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Blade servers used in communication and control applications in 
harsh environments call for high compute bandwidth and I/O through- 
put, as well as rugged design. A new 6U CompactPCI SBC from Men 
Micro has all three. 

The 64-bit/66 MHz D6 system con- 
troller board features a Pentium M or (4 ~~" je 
Celeron M processor. An on- wil aay 
board FPGA implements the fr <r. 
board’s graphics controller and a 
lets the board support a wide 
range of application-specific I/O. Up to 
2 Gbytes of ECC DDR2 DRAM is included, 
along with non-volatile FRAM and SRAM appli- 
cation memory. Sixteen of the D6’s 24 PCI Express 
lanes are devoted to the board’s two Gigabit Ethernet 
interfaces. The other eight are used for one or two XMC modules. The 
7520 Intel server chipset has two SATA interfaces for high-speed con- 
nections to mass storage devices, and an onboard PATA interface is 
available for a hard disk or CompactFlash storage. An Intelligent Plat- 
form Management Interface (IPMI) is included for monitoring and con- 
trolling critical onboard parameters. 

Pricing for the version with a 2 GHz Pentium M and two XMC 
sites starts at $2,994 for single units. 
Men Micro, Lago Vista, TX. (512) 267-8883. [www.menmicro.com]. 
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One serial port 
Two Multitech Universal Sockets 
PC104 interface to high speed quad UART 
Supports 3.3V and 5V Multitech modules 
Optional 16 channel GPS receiver 
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Hot-Swap, 12-Port 10 Gbit/s Fibre Channel/ 
Ethernet/iSCSI Card 


A new hot-swappable 10 Gbit/s Fibre Channel/Ethernet/iSCSI card is de- 
signed for use in Curtiss-Wright’s space-saving 144- and 288-port GLX4000 
Physical Layer Switch products. The RT10000 provides a modular method of 
scaling the high-density GLX4000 switch to match specific network require- 
ments. Each port card can accept up to 12 XFP transceiver modules. 


The RT10000 port card supports a wide range of optical wave- 
lengths. Its XFP transceiver modules can be configured with 850 nm, 
1310 nm, or 1550 nm wavelength optical interfaces. This enables users 
to convert network media from 850 nm optical to 1550 nm optical for 
DWDM applications. Curtiss-Wright also includes built-in signal reti- 
ming on each port to ensure signal quality. 


In addition, the 
RT10000 provides 
real-time diagnostic 

data, via XFP Digital 
Diagnostics (XDD), to deliver 
the information for timely repairs 
or prognostics. Data captured by XDD 
includes vendor, transceiver type, range, proto- 
col, transceiver temperature, transmit and receive op- 
tical power and transceiver supply voltage. The RT10000 also 
reports the temperature measured at both ends of the card. All XDD 
information can be accessed through a simple-to-use GUI or powerful 
CLI. Availability of the RT10000 port card is off-the-shelf in Q2 2006. 
Single-unit pricing starts at $24,000 in single unit quantities. 


Curtiss-Wright Controls Embedded Computing, Leesburg, VA. 
(937) 252-5601. [www.cwcembedded.com]. 








VME Multiprocessor Board for Advanced Signal 
and Image Processing Apps 


Designed to meet the most demanding needs of signal and image 
processing applications, a new multiprocessor solution from GE Fanuc 
Embedded Systems is based on Freescale’s MPC7447A and MPC7448 
processors containing PowerPC cores. The Nexus Quattro is a quad- 
processor 6U VME board that is the latest member of the Nexus family 
of multiprocessing solutions. Available in versions for both rugged and 
benign environments, the Nexus Quattro is designed to deliver perfor- 
mance and computing density required by radar, sonar, signals intelli- 
gence and image processing applications deployed in harsh operational 
environments. 


Key features of the Nexus Quattro include four Freescale Semi- 
conductor MPC7448 processors at up to 1.4 GHz, or four MPC7447A 
processors at 1.1 GHz. Memory consists of 256 
or 512 Mbytes of DDR266 SDRAM with 
ECC per CPU for a total of up to 
2 Gbytes per board. Nexus 
Quattro has two IEEE P1386 
PMC sites with up to 1064 
Mbytes/s peak PCI _perfor- 
mance along with ANSI/VITA 31.1 
(Gigabit Ethernet) support and peer-level 
VME access for all processors. The board provides VME320 2eSST 
support using the Tundra TSI148 PCI-X-to-VME Bridge for up to 320 
Mbytes/s VME performance and compatibility with legacy VME de- 
signs. OEM quantity pricing starts at $10,000. 


GE Fanuc Embedded Systems, Huntsville, AL (800) 322-3616. 
[www.gefanuc.com/embedded]. 
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DSP Card Combines DSP and FPGA for 
Wireless/High-Speed Applications 

Combining a 600 MHz TMS320C6416 from Texas Instruments and 
a 6-million-gate Virtex-II] FPGA, the Quixote DSP card from Innovative 
Integration also supports MatLab. Quix- 
ote provides bandwidth for cPCI-based 
software-defined radios, wireless IP de- 
velopment and hardware testing, ultra- 
fast flexible data acquisition, vector signal 
generation, signal identification, radar and 
electronic warfare. The analog front fea- 
tures 105 MHz/14-bit I/O input and output 
channels tightly coupled to the FPGA for 
narrow band or wideband wireless applica- 
tions. The TI DSP handles high-end signal processing like multi-threaded 
programs, adaptive algorithms and communication with the host layer. 
The FPGA provides ultra-high-speed, hardware-assisted processing for 
channel spreading/de-spreading, channel coding and down conversion. 


The analog front-end is tightly integrated with the FPGA and its 
processing core with high-speed private memory, a | Gbyte/s burst bus 
from the FPGA to the DSP and high-speed connectivity to the system 
using a 64-bit PCI or ChannelLink interface. Developers can implement 
in software a wide variety of radios, communications testers and other 
high-speed applications. The Quixote Development Package includes the 
Quixote card, TI Code Composer Studio!M DSP development and debug- 
ging tools, JTAG debugger and the Pismo toolset for Quixote. The Quixote 
boards are listed from $9,900 to $19,500 depending on the configuration. 


Innovative Integration, Simi Valley, CA. (805) 758-4260. 
[www.innovative-dsp.com]. 


Real-Time Data Accessible Across Enterprise 
and Embedded Systems 


A solution for real-time distributed applications that require high- 
performance and highly available da- 
tabase access, Real-Time Innovation’s 
Distributed Data Management v3 now 
includes Oracle TimesTen In-Memory 
Database 6. It also adds support for re- 
mote database access via RTI Data Dis- 
tribution Service 4.0 (formerly NDDS) s 
and includes support for Java Database Ia —_ ts | 
Connectivity (JDBC). Together, the 
RTI Distributed Data Management 
middleware and Oracle TimesTen In- 
Memory Database deliver the fast cross-network data access increas- 
ingly required of both embedded and enterprise applications. 








}. 





Also new in RTI Distributed Data Management v3 is support for 
standard JDBC. By providing SQL access via JOBC, ODBC (open da- 
tabase connectivity) and DDS, RTI Distributed Data Management inte- 
grates new and legacy code. Developers using the C, C++ or Java devel- 
opment languages can choose the standard interface that best suits their 
application requirements. Because each computer is a peer and there is 
no master centralized database, there are no performance bottlenecks 
or single points of failure. RTI Distributed Data Management also pro- 
vides automatic discovery, so computers can be dynamically added toa 
live system with no administration required. RTI Distributed Data Man- 
agement v3 including Oracle TimesTen In-Memory Database is avail- 
able today for the Microsoft Windows, Red Hat Linux and Sun Solaris 
platforms. Development licenses begin at $69,200 for three developers. 


Real-Time Innovations, Santa Clara, CA (408) 200-4700. [www.rti.com]. 


Boards Support Xilinx Virtex-5 Domain 
Optimized FPGAs 


Two board-level products, based on open standard form-factors 
including PMC and PCI, and supporting the new Xilinx Virtex-5 fam- 
ily of FPGAs, are targeted for deployed embedded applications as well 
as development platforms. The early adoption of the Virtex-5 FPGA 
family by VMetro enables cus- 
tomers to deploy new high gate 
density, high-performance so- 
lutions with lower power. Thus, 
V Metro can provide more com- 
puting and throughput in a given 
space. Next-generation applica- 
tions typically requiring these capabilities 
include Software Defined Radio (SDR), Signal Intelligence (SigInt) and 
real-time imaging. 

V Metro’s initial products are based on the Xilinx LX50 and LX110 
FPGAs, the first Virtex-5 platforms now available. For PCI-based devel- 
opment platforms, VMetro’s DEV-FPGAOS, based on the Xilinx LX50 
Virtex-5 FPGA, will be bundled with a software support package to 
allow Virtex-5 applications to be evaluated and developed at low cost. 
For deeply embedded applications typically found in medical imaging 
or aerospace and defense systems, the PMC-FPGA0O5D PMC module 
with a Virtex-5 LX110 will be available. I/O modules already developed 
include parallel LVDS, CameraLink and RS-422. Additional custom 
modules can be designed by development engineers or VMetro. The 
PMC-FPGAOSD and DEV-FPGAOS products will be available Septem- 
ber 1, 2006 with single-unit pricing from $5,995. 


VMetro, Houston, TX. (281) 584-0728. [www.vmetro.com]. 


System-on-Module-ETX CPU Provides Improved 
Power/Performance 


Aimed at providing a balance of computing performance with low 
power consumption, the SOM-4455 SOM-ETX-compliant CPU module 
from Advantech uses the same mechanical design and 
layout as its predecessor, providing an upgrade 
path for AMD Geode GX1 processor-based 
systems. The module is equipped with  _ «<@allli 
the AMD Geode LX processor and the “™ 
AMD Geode CS5536 companion device. 
The customer’s own application-specific 
carrier board allows embedded integrators to 
focus on their applications saving up to 80 per- 
cent of overall design and development effort. 


The latest ROHS-compliant AMD Geode LX processor delivers 30- 
50 percent higher performance with equal or lower power consumption 
as the previous AMD Geode GX1 processor, making the AMD Geode 
LX processor suited for smaller SOM-ETX form-factors. The SOM- 
4455’s fanless low thermal design benefits from the processors. The Ge- 
ode LX supports DDR memory up to 256 Mbytes and 64-bit onboard 
graphics. The SOM-4455 has four USB 2.0 ports, 10-100Base-T Ether- 
net and LVDS/LCD/AC’97 interfaces, and PCI and ISA are fully sup- 
ported. A flexible and vibration-resistant CompactFlash card for install- 
ing Windows XP/XPe and Windows CE 5.0 embedded OSs is supported 
to provide custom OS image development. Pricing for the SOM-4455 
starts at $200. 


Advantech, Milpitas, CA. (408) 519-3891. [www.advantech.com]. 
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Altera Cyclone FPGA, 275MHz core speed 
Four expansion connectors 
50MHz oscillator, SMB connector 
On board real-time counter/clock 
On board unique silicon serial number 
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High-Density Storage Blades Target Demanding Communications Apps 


A storage blade and a storage expansion blade for high-performance, high-availability communications and commercial ap- 
plications offer a combined storage capacity of two terabytes for embedded database, data caching and file serving applications. 
The CPC5900 storage blade and the CPC5910 expansion blade from Performance Technologies are meant to be integrated with 
PTI’s Advanced Managed Platforms. 


The CPC5900 is a high-density, high-availability NAS or SAN storage blade with two hot-swappable SATA hard drives and 
an onboard PowerPC processor. The product supports applications requiring full RAID arrays (0, 1, 0+1, 4, 5, 6) and is suited 
for embedded database servers, as an application server for storage-intensive services, as a storage appliance, or as a mission- 
critical logging device in defense and homeland security applications. It can also be used as a PXE boot server for automated 


deployment of raw boards and supports Performance Technologies’ NexusWare Core development, management and operating 
environment. 


The CPC5910 is a low-cost, high-density SATA storage expansion blade, expanding the storage capacity for either the 64-bit 
CPC5564 SBC or for CPC5900 blades. It features two hot-swappable, enterprise-class SATA hard drives with a SATA interface. 
Paired with the CPC5564, which provides the computing power to make this product a full RAID and IPSEC solution, it can offload 

the locally attached CPC5564 single hard drive to provide more reliability, higher capacity and higher performance. Pricing for the 
CPC5900 starts at $2,600 and at $1,675 for the CPC5910. 


Performance Technologies, Rochester, NY. (585) 256-0200. [Www.pt.com]. 
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We design solutions. 

Mercury customers span multiple industries and face unique computing challenges —whether improving yields 
tor semiconductor wafer inspection, increasing throughput in medical imaging, rendering high-quality animation 
within a defined budget, or packing enormous processing capacity In deployed ground vehicles. Why are we so 
driven to tackle these difficult computing problems? Because your challenges drive our innovation. 


Our new 1U Dual Cell-Based Server significantly improves performance tor computationally 







intensive HPC applications in medical, industrial, defense, seismic, telecommunications, and 
’ other industries. Contact us to learn how our products 
and services can optimize your challenging applications. ME Computer Systems, Inc. 


RCURY 


Challenges Drive Innovation™ 


Let us design an innovative solution for you. 
www.mce.com/rtc6 
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Pluggable 14-Slot AdvancedTCA Full Mesh 
Backplane 


A 14-slot ATCA Full Mesh backplane from Elma Bustronic fea- 
tures an 18-layer stripline design with pluggable fan tray, shelf manager 
and power entry connectors. Compliant to the PICMG 3.0 specification, 
the backplane has a theoretical performance of 1 Tbit/s. Elma Bustron- 
ic’s Signal Integrity (SI) lab has performed simulation and backplane 
characterization. 


Dual Star or Mesh configurations can be utilized within this 
same backplane, offering customers more flexibility in their design. 
AdvancedTCA has several key 

features including Gigabyte/ 
Terabyte per second bandwidth 
across each shelf, 150-200W per 
board and 3 Kilowatts per chas- 
Sis power, and accommodates 
larger (8U x 280mm) boards on 
a 1.2 mm pitch, which allows 
larger/taller components and 
more space on each board. Pricing 
is under $2,000 depending on volume and configuration requirements. 
The lead-time is 4-6 weeks ARO. 


Elma Bustronic, Fremont, CA. (510) 490-7388. 
[www.ElImaBustronic.com]. 





Rugged 6U VME SBC Boasts Independent Dual 
Processors 


An asymmetrical distributed architecture eliminates data flow bot- 
tlenecks in high-performance applications such as mission management 
computers, heads-up display controllers, radar and sonar processors, 
and advanced IED automatic protection subsystems. A rugged VME 
SBC from Aitech employs independent dual processors so each func- 
tions as a complete subsystem with local memory resources and basic 
I/O interfaces. 


The VME64x-compliant 6U C102 SBC incorporates two PowerPC 
G4+ MPC7448 processors operating at 1.42 GHz with on-chip 32 
Kbyte LI and 1 Mbyte L2 caches, communicating over a high-speed 
PCI-X bus. Up to 2 Gbytes of ECC DDR 
SDRAM, 256 Kbytes of NVRAM, up 
to 256 Mbytes of boot flash and up 
to 1 Gbyte of user flash memory 
are included, as well as up to 16 

Gbytes of NAND onboard flash 
file memory. For connectivity 
and expansion, the board features 
two 10/100/1000 Ethernet and two 
10/100 Ethernet ports, two USB 2.0 ports, one 
SATA II port and two dual redundant MIL-STD-1553B 
ports, as well as six high-speed USART synchronous/asynchronous and 
two UART serial ports, and two PMC modules. 


BSPs are available for WindRiver VxWorks and Green Hills IN- 
TEGRITY RTOSs, among others. Options include single processor 
configurations and conduction- or air-cooled models. Pricing starts at 
$6,750. Delivery is from stock to 8 weeks ARO. 


Aitech Defense Systems, Chatsworth, CA. (888) 248-3248. 
[www.rugged.com]. 
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Wireless Device Server Module is 802.11b/g- 
Compatible 


Handheld devices in WiFi machine-to-machine networks are get- 
ting faster connection rates via the newest 802.11g wireless networking 
standard. But designers of systems incorporating this protocol also need 
backward compatibility with previous 802.11b devices. A new line of 
802.11b/g embedded modules from Quatech do exactly that. 


The drop-in Airborne Wireless Device Server Module features 
802.11g connection rates as well as plug-and-play compatibility with 
the company’s 802.11b products. The module’s built-in TCP/IP stack, 
RTOS and application firmware permit instant connectivity to a LAN or 

the Internet, with no device driver or host processor 

firmware development required. The module 

offers power management capabilities, 

integrated Web server, general-pur- 

pose I/O and embedded interface sup- 

port for UART, I°C, SPI and industry- 
standard RS-232/422/485 protocols. 


An extended operating temperature 
range of -40° to +85°C, advanced security 

protocols (WEP—64- and 128-bit, WPA, 
802.1x-LEAP, integrated AES/CCMP), RoHS compliance and regula- 
tory pre-certifications allow OEMs to leverage Quatech’s license grant 
and bypass most or all regulatory testing. List price is $129. 


Quatech, Hudson, OH. (330) 655-9000. [www.quatech.com]. 





Total Power: 168 Watt with ATX Interface 
+3.3V, +5V, 12V output 
6V to 40V DC input range 
Extended temperature: -40°C to +85°C 
RS232 serial port / Optocoupled inputs 
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Competition Heats Up 
in the Battle for COTS 
Processing Technologies 


FPGAs have long been paying their dues for acceptance into military appli- 
cations, and at the moment, they appear to have the advantage over newer 
technologies, such as multicore processors. 


by Neil Harold 
Nallatech 


Ithough traveling means different things to different peo- 
A there is generally a consistent theme to it whether it’s 

the daily commute, going on vacation or traveling on busi- 
ness. You have an intended route, a rough idea of how long it 
should take and of course a destination. In the technology world, 
however, things are not quite so straightforward. Innovation and 
technical breakthroughs cannot be predicted and cannot lead 
whole industries on a journey with no defined route, no concept 
of duration and no clue as to the destination. 

The computing world is approaching a key juncture on just such 
an endless journey. Moore’s Law has relentlessly pushed forward 
processing technology for the last 40 years, but there is doubt as to 
whether the next batch of technical and physical obstacles to Moore’s 
Law can be overcome. Hence, there is huge interest in assessing al- 
ternative and breakthrough approaches to delivering performance 
increases to satisfy the insatiable demands of the user community. 

Nowhere are those demands more stringent than in the Defense 
Embedded arena, where increased emphasis on persistent surveil- 
lance is driving growing interest in technologies that can offer supe- 
rior sensor processing performance in C*ISR applications. However, 
delivering on military programs is about much more than the techni- 
cal specifications vendors can offer. The performance metrics come 
with very strict size, weight and power (SWAP) constraints, while 








Sin/Cosi 


Schematic of a butterfly function, which buffers 
input data through memory and processes it 
against stored sine/cosine values. 
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obsolescence management and risk mitigation arguably rank higher 
than performance as the driving forces behind decision making. 

Multicore processors and fully programmable architectures 
are the two highest-profile technology solutions to these process- 
ing challenges, with Field Programmable Gate Arrays (FPGAs) 
and the Cell Broadband Engine poised to compete for military 
programs and applications being developed over the next three 
years and beyond. Clearly both technologies offer processing 
capability that could deliver against the military’s demands for 
enhanced battlefield intelligence, but how does each respond to 
the wider requirements of the Defense Embedded market? 

The concerns around risk in military programs drive a con- 
servative approach to technology adoption, particularly where 
field deployment is concerned. On this basis, comparing FPGAs 
with the Cell highlights a great irony—FPGAs, for so long con- 
sidered a disruptive influence, now have the look of a proven and 
mature technology, having gained significant traction in military 
applications in recent years. A number of ongoing initiatives 
within organizations such as VITA are testament to this maturity 
and, sitting alongside the Cell, it puts into perspective the extent 
to which FPGAs have achieved genuine acceptance and adoption 
within the military community. 

Advocates of the Cell, on the other hand, must accept that the 
life cycle for new technologies breaking into Defense and mili- 
tary deployment programs is at least 3-4 years, regardless of the 
pledged performance benefits or those promising them. This is a 
fundamental of doing business in the Defense Embedded market, 
which is an area where FPGA-based technology, through full in- 
system upgrades, delivers against the life-cycle requirements of 
many target platforms and applications. 

Risk mitigation also manifests itself in a keen interest in ease of 
use, predictability and backward compatibility. With familiar RISC 
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Pentium-4 3.2GHz 6.4GFLOPs 
TI DSP (TMS320C67x) 300MHz 1.5GFLOPs 
Cell BE 3.2GHz 200GFLOPs 
Virtex-4-FX140-11 350MHz 67.9GFLOPs 
Table 1 


0.08 Available Now 
1.64W 0.91 Available Now 
60W 3.33 License from IBM 
18W Ot Available Now 


This illustrates the step change in performance offered by new processing architectures like FPGAs and the Cell BE. It is this 


performance increment that gains significant interest from defense embedded users in these and other new technologies. 


technology running behind the scenes, multicore processors should 
benefit from being able to support existing coding standards as well 
as legacy code. However, given that the purpose of adopting this type 
of technology is to gain a performance benefit, users must be able to 
take advantage of the parallel architecture, and this requires advanced 
compilers, which take time to develop and prove. This proving pro- 
cess 1s well known to the FPGA community, which has worked hard 
in recent years to aid usability and support legacy software code by 
developing new high-level tools. Consequently, there are now more 
design tools than ever ava ilable for FPGAs, running from a variety of 
environments familiar to the software and DSP engineer. 

The approach to obsolescence management distinguishes 
military developers from their commercial counterparts, and 
while COTS initiatives and directives have helped ease the burden 
in recent years, it remains a critical factor in the defense procure- 
ment process. Any consideration given to new technology is ac- 
companied with a question regarding projected availability in 10 
years time, which must be satisfactorily answered. The advent of 
the military Evolutionary Acquisition Strategy means even more 
emphasis will be placed on technology roadmaps and long-term 
availability. In many ways the Cell’s background in the gaming 
industry will serve as good preparation for this harsh reality of 
doing business in the defense industry—new technologies are ulti- 
mately sustained by volume. 

A challenge for the Cell in the defense space will be to demon- 
strate volume potential early enough in the adoption cycle to promote 
its acceptance—failure to do so will make securing deployment con- 
tracts even more difficult than it already is with a new technology. 
It is unclear how much assistance the Cell can expect from the tra- 
ditionally volatile gaming industry, where technology refresh cycles 
are far shorter than the military, in securing long-term stability. For 
e processing technology? FPGAs have overcome the hurdle of ob- 
solescence through vendors such as Xilinx and Altera demonstrating 
consistency and code compatibility through multiple device genera- 
tions, leading to deployment in a variety of military programs such 
as the Joint Tactical Radio System (JTRS). 

While these are all areas of huge concern to the military devel- 
oper, the overriding reason for the interest in these technologies is 
the belief that they will deliver significant performance increments. 
With headline performance figures of 200 GFLOPs for the Cell BE, 
it’s easy to understand the interest. Similarly, FPGAs are known to 
deliver significant performance benefits, especially where SWAP is 
a big concern. Headlines are just that, however, and it’s worth con- 
sidering an example to get a sense for the type of performance a 
developer can expect when implementing a real-world application. 


The Fast Fourier Transform (FFT) is a fundamental build- 
ing block of digital signal processing and is applied across a huge 
variety of military applications where data is captured in the time 
domain but processed in the frequency domain. At the heart of the 
FFT is a function known as the “butterfly,” which works by taking 
the real and imaginary parts of each sample of the time domain 
signal and multiplying each of them by a sine or cosine term. This 
butterfly is then repeated many times to construct the full FFT. 

The physical implementation of the butterfly is likely to see 
the sin/cosine values stored in local memory, with the incom- 
ing sample data also buffered through memory (Figure 1). This 
means that to perform a single butterfly function, 10 memory 
accesses are required—4 to fetch the complex data (2 x real and 
imaginary parts), 2 to fetch the complex sin/cosine values and 4 
to store the complex results. 

In an FPGA, the butterfly is comfortably implemented using 
the array of configurable memory blocks and embedded multipliers. 
In a Xilinx Virtex-4-FX140-11 for example, 552 configurable mem- 
ory blocks and 192 embedded DSP blocks allow for 8 simultaneous 
butterfly calculations to be carried out per clock cycle. Running at 
a clock speed of 350 MHz this provides an overall performance of 
33.6 GFLOPs (6 multiplications and 6 additions per butterfly). 

Each of the eight synergistic processing units (SPUs) within 
the Cell has a local dual-port 256 Kbyte memory. The memory 
bandwidth required to run the FFT butterfly function challenges 
the Cell BE architecture resulting in a computed performance of 
46.8 GFLOPs across the eight SPUs. 

This example demonstrates just how dependent performance 
is on the application, and that the data in Table 1 must be consid- 
ered in the context of the application. When taking into account 
the relative power consumption, it is clear that the fully flexible 
architecture of the FPGA out-performs the Cell in the FFT im- 
plementation, where data flow restricts the performance of the 
Cell, which is much more suited to vector processing. 

Reading the signposts at this particular crossroads, it’s not 
yet clear which direction will be taken. Perhaps the best path to 
take is one that supports a blend of processing technologies, pro- 
viding users with the power and flexibility of FPGAs alongside 
conventional processors where appropriate, in order to precisely 
match the platform to the needs of the application. 4 
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Flexible Multicore 
Pipeline Infrastructure 


Reduces Software Pain 


A fixed pipeline puts the entire code partitioning burden on 
software. A pipeline in an FPGA allows a simple code partition, 
with a pipeline designed to suit the partition. 


by Bryon Moyer 
Teja Technologies 


cessor programs into multicore architectures. Less attention 

is given to the opposite: adapting multicore architectures to 
meet the needs of an erstwhile single-processor program. While 
software engineers optimize software to meet some hardware 
constraint, a new possibility is emerging that allows that same 
engineer, with much less work, simply to change the hardware to 
eliminate the constraint and be done. This is being put to practice 
for high-speed multicore packet processing. 

Multicore architectures can come in many forms; packet 
processing tends to be solved with a parallel pipeline structure 
(Figure 1). In this kind of structure, a program that might have 
been executed as a single unit is broken up into a series of sub- 
programs, each of which executes in a stage. The process of de- 
ciding where to break the program up can have a major impact on 
the project duration, especially if initial partitions turn out to be 
bad and need to be redone. Simplifying program partitioning can 
go along way toward speeding up the completion of a project and 
the delivery of product to market, and provides a clear example 
of a situation where changing the hardware can obviate weeks of 
software optimization time. 

Any packet processing project will have a “line rate” re- 
quirement that specifies the bit rate that must be accommodated. 
If an initial partition of the program fails to meet line rate and 
must be redone, significant time can be lost. The partition of the 
program depends on the pipeline being targeted. Using a fixed 
hardware architecture means that a program must be partitioned 
into a fixed number of equal sections, and software changes 
make repartitioning likely. If instead there is flexibility in the 
hardware, the chances of having to repartition the program drop 
dramatically. 
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t is common these days to talk about adapting single-pro- 





Partitioning for a Fixed Architecture 

A high-level view of a simplified forwarding program pro- 
vides a good example, and would look like Code Example 1. A 
program like this would be included within a polling loop that 
looks for incoming packets and processes them as they arrive. 
Note that for clarity, we will use a simplified view of packet for- 
warding to describe partitioning principles. 

Now what happens if this program needs to be adapted to the 
specific pipeline shown in Figure 2, for example? If no code beyond 
the high-level code above has been written yet, then the designer has to 


A generalized parallel pipeline. Each stage 
performs a portion of a task, in order, moving from 


left to right. There can be multiple stages and 
multiple rows. 


A 3x2 parallel pipeline. If this is a fixed 
architecture, then six engines must suffice for the 
desired performance, and the program must be 
broken into at most three pieces. 
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struct packet { 
<header fields> 
<payload> 


} 


void forwardPacket (struct packet *PacketPtr) { 
filterPacket (PacketPtr) 
decapPacket (PacketPtr) 
classifyPacket (PacketPtr) 
validatePacket (PacketPtr) 
getNextHop (Packet Ptr) 
encapPacket (PacketPtr) 


} 
Code Example 1 


guess where the best partitioning points will be. But being wrong could 
entail a lot of rework late in the project, so most likely the designer would 
want to write the entire program first and then do cycle profiling. 

In order to work quantitatively with pipelines, we have to know the 
cycle budget, which is the amount of time that the engine has to operate 
on a packet. For | Gbit/s traffic, clocking our processors at 100 MHz, 
we end up with a 67-cycle budget. Still, it is not possible to process an 
entire packet in 67 cycles, which is why we use multiple processors. 

Each processor multiplies the effective cycle budget for an in- 
dividual processor. If the overall program takes 365 cycles, then 
365/67=6 (rounded up) processors will be needed. If we were target- 
ing a four-engine pipeline, we would know that the hardware could 
not support the needed performance. Since we are targeting a six-en- 
gine pipeline, this rough analysis says we should be okay. There are 
two processors per stage in Figure 2, so each processor gets twice 
the cycle budget, or 134 cycles. In other words, one engine can take 
twice as long as the cycle budget, but the second engine picks up the 
next packet when it arrives, as illustrated in Figure 3. 


Packet arrivals 


Stage 1/1 1 
Stage 1/2 


Stage 2/1 


Stage 2/2 





By adding engines in parallel within a stage, each 
stage can handle more packets without dropping, 
effectively multiplying the cycle budget by the 
number of engines in the stage. 





en 
Stage 1 1 > 2 3 po 3 
Stage 2 1 2 3 
° | | | 
@ 
@ 


If one stage is longer than another and is longer 
than the cycle budget, as is the case with stage 2 
here, that stage will become a bottleneck and will 
ultimately cause packets to be dropped. 
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The basic components of an engine. Private code 
store, data store and offloads (if used) dramatically 
reduce the amount of contention for these frequently 
used resources. 


Because each stage of the pipeline has the same number of en- 
gines, each stage should take on an equal number of cycles. When 
each stage has the same cycle count, then each stage can hand its 
completed task off to the next stage in time to receive its new task. 

By contrast, one can picture an unbalanced pipeline where the first 
stage does 100 cycles and the second stage does 168 cycles’ worth of 
work. Together they do 268 cycles, the same amount of work as if they 
both did an equal 134 cycles. So it would seem that this would still 
work, since together they do the same amount of work as the first two 
stages of the balanced pipeline. But a new packet will arrive at the first 
processor of the first stage every 134 cycles, while the second stage 
finishes every 168 cycles. At some point the second stage is going to get 
behind; Figure 4 shows that the fourth packet will be dropped. 

This shows that we have to balance the code into three equal 
pieces, one piece per stage. So in theory, we have a clean way to 
make this program work at line rate without dropping packets. But 
in reality, how easy is it to partition in exactly the right place? Will 
such partitions create extra overhead when the packet moves from 
stage to stage? Are we breaking in the middle of a deeply nested 
function, or in the middle of a fast loop? And what happens if we 
add more code to the program later such that we create an imbal- 
ance? Such scenarios create the possibility of having to change the 
partitioning, or worse yet, having to move to a new architecture 
with more engines, which will entail a complete new partition. 


Flexible Partitioning in Flexible Hardware 


An FPGA doesn’t mandate a particular configuration of a 
pipeline. Using a soft processor core like the MicroBlaze core 


Packet arrivals 


Dropped! |Dropped! |Dropped! 
Stage 1 1 


Dropped 
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0 67 134 201 268 335 


Stage 2 


The first step in designing a pipeline is to do the 
partition. But it is unlikely that a single row reflecting 
that partition will meet throughput requirements, and 
because a single row is balanced, an unbalanced 
partition will create bottlenecks. 














Adding engines to each stage, proportional to 

the cycle count of each stage, results in an 
unbalanced pipeline that matches the imbalances 
in the cycle counts such that no bottlenecks are 
created. A load-balancing channel is critical to 
making this work. 


from Xilinx, one can create a processing engine that 
is self-contained and can be replicated (Figure 5) as 
many or few times as needed. The number of proces- 
sors required will be different for different designs. 

Let’s take a new look at the program partition 
if we can build our own pipeline. We know we need about six 
engines, but that doesn’t tell us how those six should be arranged. 
We could have a six-stage pipeline with one engine in each stage, 
or a 2x3 or a 3x2, etc. The question of how many stages to create 
is one of timing and hardware resources. Adding more engines in 
parallel reduces latency but can use more resources; adding more 
stages can use fewer resources at the expense of increased la- 
tency. In addition, changing the number of stages for a design that 
has already been partitioned means redoing the partition; adding 
engines in parallel does not require any program changes. 

With partitioning flexibility we can simply pick “natural” 
places for a partition. In this case, we will partition into two stages, 
putting the packet “acceptance” routines (filtering, decapsulation, 
classification and validation) in one stage and the packet “forward- 
ing” routines (next-hop lookup and encapsulation) in a second 
stage. We can now create two programs from the original. Adding 
the loop logic and the APIs for receiving and sending from a chan- 
nel yields the two simplified programs in Code Example 2. 

It is very likely that these two programs will not have the same 
cycle counts. Let’s assume that the first program takes 250 cycles and 
the second takes 115 cycles (ignoring the fact that the sum of the cycle 
counts of two programs will actually be slightly different from the cycle 
count of the original combined program). A single-row two-stage pipe- 
line would now be executing at 250 and 115 cycles for the two stages, 
neither of which meets the cycle budget of 67 cycles (Figure 6). 

At this point we can add engines in parallel to raise the 
throughput. For the first stage, we need 250/67=4 engines; for the 
second stage we need 115/67=2 engines. Now we have an irregu- 
lar pipeline as shown in Figure 7. We place a piece of logic be- 
tween the stages that acts as a load-balancing channel. This logic 
can move packets from any engine in the first stage to any engine 
in the second stage; this is critical for such an unbalanced pipeline 


void acceptPacket 


while 


void sendPacket 


while 


(void) { 


struct packet *PacketPtr 


(1) { 

channelReceive(inChannel, PacketPtr); /* API */ 
filterPacket (PacketPtr) 

decapPacket (PacketPtr) 

classifyPacket (PacketPtr) 

validatePacket (PacketPtr) 

channelSend(outChannel, PacketPtr); /* API */ 


(roid) + { 


struct packet *PacketPtr 


cl) 

channelReceive(inChannel, PacketPtr); /* API */ 
getNextHop (Packet Ptr) 

encapPacket (PacketPtr) 

channelSend(outChannel, PacketPtr); /* API */ 


Code Example 2 


to work. The unbalanced pipeline is able to handle all packets at 
line rate. Even though there are fewer engines in the second stage, 
there is also less processing in the second stage, so that the exist- 
ing engines free up faster and can be reused more frequently. 

This illustrates that a flexible hardware platform like an FPGA, 
equipped with a flexible multicore infrastructure, can provide a means 
of circumventing what would otherwise be a lengthier partitioning 
process. It is also more robust in the face of changes: if more code 
is added or deleted in a manner that changes the balance of engines, 
engines can be added, dropped, or moved from one stage to another. 

Teja has designed such an infrastructure and a means for 
assembling it using only an ANSI C program. A pre-existing 
hardware configuration program has been parameterized so that 
the pipeline topology can be modified simply by editing a few 
numbers in a C header file. The pipeline above is implemented by 
changing two simple parameters: 

#define PIPELINE LENGTH 2 

#define PIPELINE CONFIG {4,2}; 


If future code changes require changes in pipeline configu- 
ration, these numbers can simply be tweaked further to adjust to 
the new pipeline. 

Based on this infrastructure, changing the hardware to meet the 
needs of the software partitions is far easier and takes far less time than 
having to partition carefully to meet the constraints of a pre-defined 
pipeline. It also reduces the chances of having to rework significant 
pieces of the software late in the project as the deadline looms. 4 


Teja Technologies 
San Jose, CA. 
(408) 288-2560. 
[www.teja.com]. 
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MM-15x0 & MM-16x0 
* Two CoSine 2VP70™ System-on-Chips 


© Two 3}. XILINX’V-4 SX55 or LX160 
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